Performance Testing w/ Fedora Help(*2)

In the next couple of weeks or so, we’ll be executing performance testing of Kolab on OpenPower in one of the world’s largest testing facilities. How do we do this? With help of Fedora(^2).

Part I: The Data Set

A good performance test requires a good data set. In the particular set of tests, we seek to load the IMAP server by emulating client actions such as sorting, threading and searching.

This needs a somewhat rather predictable data set. Some threads need to be short, some long. It should entertain a realistic set of commentators.

Let’s suppose we do this with randomly generated data. We might use a dictionary, but using “common phrases” in testing will then likely have to give. Let’s give it some volume (several GBs), and let’s spread this across some 13’000 INBOX folders.

Doing this with randomly generated data would require multiple test runs. It’d also need analysis of loads of results, a mathematical analysis, some justification (in the form of a proof), and then ultimately, maybe, justify a graph.

It could also use a reliably reproducible data set — perhaps by re-using the same data set generated randomly over and over again.

So, rather than generate some garbage nonsense, and then attempt to hit some of the unpredictable patterns, why not use the archives of the Fedora development mailing list?

Some 15 years of discussions on various topics and, more importantly, language and terminology we know and understand, senders and topics we can reproduce from the top of our head. Since the data set is preferably reproducible (for multiple test runs, and reliability of the data set).

Part II: No Red Hat Enterprise Linux 8 Yet

While RHEL 7 ships with GCC 4.8, many advancements and optimizations have been applied since. In February 2016, the goal for a customer-funded development project had been to make available Kolab to Power 8 Little Endian, preferably using “Advance Toolchain 7.1” — support for little endian had just been added, and so was platform support for RHEL 7.2 (of which we used the beta). EPEL for ppc64le was not yet available — so I needed to recompile all of Kolab’s EPEL dependencies as well.

Now, the target has shifted a tiny little bit;

  • RHEL 7.4 has (some) support for KVM on Power 8, but not using “Advance Toolchain”, and CentOS ships it, but only RHEV is supported,
  • EPEL is meanwhile available for ppc64le, but not using “Advance Toolchain”,
  • Advance Toolchain 10.0 is available, using GCC 6.

Now, I can choose either of the following options for the hypervisor:

  1. CentOS 7
  2. RHEV

Frankly, Kolab isn’t too bothered with the hypervisor — it’s the customers choice. We’ll just right-out recommend some KVM wherever we go, whether or not it is made more convenient with the other parts of a RHEV solution or be it OpenStack. We shouldn’t really care. But in reality, we do.

Then there’s IBM’s PowerVM. While it is a proprietary solution, Kolab really doesn’t care (aside from the left-leaning advocacy conversation with IBM). Nor should it. But for the specific exercise I talked about before, PowerVM is reputed to have more optimizations in place. If the goal is to show “how many circles Power 8 runs around x86”, then our solution needs to be benchmarked on PowerVM guests as well.

I’m saying all of this to underline that most enhancements for the particular CPU architecture icw. little endian have been submitted upstream, and “Advance Toolchain” is really just the portable version of part of that.

RHEL 7 is (going to be) lagging behind to no end, like RHEL 6 appeared to do while RHEL 7 wasn’t available yet. We’re thus also in the sweet-spot for it being more cost-effective to chose to run Fedora 26 (and eat the costs of doing so), than it is to back-port everything to a separate /opt/at10.0/ “enhancement stack” such as Advance Toolchain, just for the sake of entertaining legacy platforms (like RHEL 7).