Hi, On Fri 04-02-11 14:12:12, Richard Ems wrote:
I was trying to check if updating to 2.6.37 on our openSUSE 11.3 nodes would bring any performance changes for our application (HPC CFD programs) and was surprised to see a big performance decrease on my test runs. Trying to follow this problem down I came to the following perl test line to show the differences.
If I run the following perl line on the same hardware on openSUSE 11.3, with 2.6.34.7-0.5-default and with 2.6.37-6-default (from http://download.opensuse.org/repositories/Kernel:/stable/openSUSE_11.3/x86_6... ) I get VERY different results:
On 2.6.34 it runs in about 10 secs. On 2.6.37 it takes at least 17 secs to run.
Could someone else test this on another hardware? I've run your script on my testing machine and I can see a difference as well (I've compared 2.6.34 with 2.6.36 kernel). I have dual quad-core AMD Opteron at 1 GHz with 4 GB of ram. The difference isn't that big as for you but still noticeable - about 20% on average I'd say.
Please run a couple of times with 2.6.34 and then with 2.6.37. Do others also see this big differences?
The system I am testing on is a dual Xeon X5650 @ 2.67 GHz (12 cores) with 24 GB memory. Please set $T to your number of cores. Perhaps you would also have to reduce the (1..9e6) if you have less than 2GB per core.
time perl -e 'use threads; $T=12 ; foreach (1..$T) { $thr[$i++] = threads->create(sub { printf "I am thread %s\n", threads->tid(); foreach (1..9e6) { push(@a, sqrt(1234)/sin(1234)) } ; printf "thread %s finished.\n", threads->tid(); }); } foreach (0..$T-1) { $thr[$_]->join(); }'
Does anyone have an idea why this could be happening? For such a simple test case I'd say that it's a difference in scheduling or some system call used implicitely by perl. What seems a bit strange though is that after all threads report they are finished, it still takes a noticeable time for parent thread to exit. From a quick strace I see a storm of madvise() calls when the thread exists so maybe this is the culprit? If yes, I'd try to oprofile the test on both kernels and compare results.
Umm, btw, I've just tried your test with just a single thread and it's
still slower on 2.6.36 than on 2.6.34 so indeed this seems like a
regression in some of the system calls.
Honza
--
Jan Kara