How to do an invalid benchmarking test

A post on a mail list pointed me to a what was supposed to be a good benchmark test posted on a blog. The test is Ubuntu 8.04 LTS vs. Windows XP SP3: Application Performance Benchmark. I read down through the report and some of the results seemed a little odd. Then I get to the end where the equipment and software details are laid out and see this:

  • HDD (Windows XP): Western Digital, WD1600JD, Capacity:160GB, Cache: 8 MB, SATA150, 7200rpm.
  • HDD (Ubuntu 8.04): Maxtor DiamondMax 21, STM3160215A, Capacity:160GB, Cache: 2MB, ATA100, 7200rpm.

Talk about poor test design, XP and Ubuntu are running from two different disk drives from two different manufacturers. On top of that they are on totally different interface busses and have different size drive caches. It boggles my mind how someone can spend their time doing a benchmark and totally invalidate the results by giving the two OS’s different hardware to work with, duh. I guess this next bit from the details shouldn’t have surprised me.

  • I also disabled RAM swapping on both Windows XP and Ubuntu.
  • OK, you change the OS suppliers recommended default setting to a non-recommended setting and you think you’ll get a fair test, double duh. If you want to do a fair test of two competing OS’s you absolutely must use the same hardware for both OS’s and use the OS’s recommended performance settings. Anything else is ridiculous and completely invalidates the results.