Re: [opensuse-es] Benchmark openSuSE, Ubuntu, Fedora y Mandriva

18 Jul 2009

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

El 2009-07-18 a las 11:04 +0200, Camaleón escribió:
...
El 2009-07-18 a las 00:43 +0200, Carlos E. R. escribió:
(...)
...
En /. ya comentaron que estos estudios de phoronix son realmente poco
fiables. No se fían porque no se conoce realmente su metodología.
Bueno, la metodología parece clara: usan su Phoronix Test Suite (*) e
indican el tipo de componentes sobre los que han hechos las pruebas.
En slashdot no se fian de phoronix.

http://linux.slashdot.org/story/09/06/30/1543246/EXT4-Btrfs-NILFS2-Performan...

Someone did file a ticket [sqlite.org] at SQLite but from the comments in 
there you can see that what Phoronix did is not reproducible.

...

Here's a post [slashdot.org] linking to some other posts discussing some 
problems with the Phoronix benchmarking methodology. The same issues seem 
to be pointed out every time they get a benchmark article published on 
Slashdot.

http://slashdot.org/comments.pl?sid=1068299&cid=26173779
Java Performance On Ubuntu Vs. Windows Vista

Various problems with the Phoronix test methodology have been noted before 
[slashdot.org] and before that [slashdot.org]. Without going over the same 
stuff, here are some potential questions about this benchmarking:

     * Where is the statistical analysis of these results - ok, you ran a
       test once and it was 30% slower. Is this reproducible? What is the
       variance? Is there any statistical difference between openjdk/sun
       java?
     * Why is the Java minor version different? Do you see the same results
       if the same minor version is used?
     * As mentioned in the previous discussions, exactly why is Windows
       slower on the file encryption task - it should be either limited by
       disk throughput, or by CPU throughput, so observing a 40% drop in
       performance attributed to the underlying I/O handling of the
       operating system is somewhat surprising; are you sure the test
       methodology is sound here, and if so, how do you explain the
       results?
     * Are these results applicable to both 32 and 64 bit distributions and
       JDKs?
     * How do you know that the 2D benchmark performance on Linux is
       attributable to poor graphics drivers? Why not run the test on
       another PC and then swap out graphics cards (hence eliminating all
       other factors) and report on the results?

There are a lot of questions that this benchmarking should have answered, 
and a lot of assumptions made that could potentially be invalid.

http://apple.slashdot.org/article.pl?sid=08/11/06/1315243&from=rss
Ubuntu 8.10 vs. Mac OS X 10.5.5 Benchmarks

Also worth mentioning are the collection of posts from the last thread 
that convincingly argued various problems with the Phoronix Benchmarks.
Example 1 [slashdot.org]
Example 2 [slashdot.org]
Example 3 [slashdot.org]

Speed tests are good, let's make sure we're doing them right

http://apple.slashdot.org/article.pl?sid=08/11/06/1315243&from=rss
Is Ubuntu Getting Slower?

I can see several problems with the testing methodology as is:

     * The test suite itself: The Phoronix test suite runs on PHP. That in
       itself is a problem-- the slowdowns measured could most likely be
       *because* of differences in the distributed PHP runtimes. You can't
       just say "hey, version Y of distro X is slower than version Z! LOLZ"
       because, WTF. You're pretty much also running different versions of
       the *test suite* itself (since you have to consider the runtime as
       part of the test suite). Unless you remove that dependency, then
       sorry, you can't measure things reliably. Which brings me to my
       second point...

     * What exactly are they testing? The whole distro? The compiler (since
       most of the whole of each distro version is compiled with different
       versions of GCC)? The kernel? If they're testing the released
       kernel, then they should run static binaries that *test* the above,
       comparing kernel differences. If they're testing the compiler, then
       they should build the *same* compiled code on each version and run
       said compiled code (which is pretty much what I gather they're
       doing). If they're testing the utilities and apps that came with the
       distro, then they should have shell scripts and other tools (which
       run on a single runtime, not depending on the runtime(s) that came
       with the distro version). Because if you don't, you have no fucking
       clue what you're testing.

Honestly, I was unimpressed by the benchmarks. I happen to do performance 
benchmarking as part of my job, and I can tell you, you have to eliminate 
all the variables first -- isolate things to be able to say "X is slow". 
If you rely on a PHP runtime, use *exactly* the same PHP runtime for all 
your testing; otherwise, you'll get misleading results.
...
Lo que yo creo que es de dudosa utilidad es el resultado de probar las
versiones de desarrollo, sea el producto que sea, y menos a modo de
comparativa.
Pues no, claro.
...
Supongo que los de Phoronix querían "hincarle el diente" a los kernels de
la rama 2.6.3x.
Me parece que esa gente vive de hacer comparativas.

- -- 
Saludos
        Carlos E. R.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkphnhIACgkQtTMYHG2NR9XVFACdFSswSKEoO+2mlcvZSucG4Faj
iroAniAom1oWea508lFfNhwfZScyTnpF
=V04b
-----END PGP SIGNATURE-----