On Tue, 2006-05-09 at 14:43 -0700, Shaun Q wrote:
Hey everyone --
First of all, I apologize if this is an inappropriate forum for this... but perhaps someone has an idea...
So, we installed a new cluster here with some Athlon 64 machines... and we're finding that these new machines are quite a bit slower than our old AMD64 test machine...
So I ran lmbench 3.0 on these machines to see if I could find some of the bottlenecks -- and came up with the following:
Two machines were tested -- CT4 and CT115:
CT4 is an opteron 1.8 with a 9.3 normal install. CT115 is an Athlon 64 3400+ (clock at 2400 Ghz) with a diskless install (and we've tried the same hardware with a normal non-diskless install and the results are simular...)
So some of the basic results were what we'd expect... numbers were better on CT115: Processor, Processes - times in microseconds - smaller is better ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ct4 Linux 2.6.4-5 1786 0.10 0.39 4.46 5.09 15.2 0.31 1.22 115. 480. 2886 ct115 Linux 2.6.16. 2410 0.07 0.13 1.42 2.17 9.29 0.19 1.07 113. 393. 4223
Float/Int, etc.. numbers blew the CT4 machine too...
But then we get to some of the context switching tests for example: Context switching - times in microseconds - smaller is better ------------------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ------ ------ ------ ------ ------ ------- ------- ct4 Linux 2.6.4-5 0.5200 0.6600 5.2300 2.3500 5.7400 2.48000 18.0 ct115 Linux 2.6.16. 0.5400 0.5800 3.1800 1.8900 26.0 3.83000 49.5
After we get to 8p/64K and above, the numbers are quite bad as you can see... And the Bcopy and memory numbers:
*Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- ct4 Linux 2.6.4-5 1077 1149 520. 1136.2 2453.4 938.2 938.6 2427 1360. ct115 Linux 2.6.16. 1798 2062 269. 259.3K 1489.2 391.6 389.4 1483 578.8
So, where should I be looking here in order to get these machines up to where they should be? Anyone have any ideas to help me overcome this issue? Is it hardware or software? Would a memory issue cause this? Even though they both have 1Gb PC3200 memory, CT4 has ECC memory and CT116 just has regular el-cheapo Corsair memory. Would that make a difference here?
Based on my previous testing and experiences I think you are seeing a difference based on L2 cache. I was suprised to see how much of a difference it can offer. I tested with my workstation a Athlon 64 X2 4200, and my home PC, a Athlon 64 X2 4400. The only difference is the 512K/1MB cache on each core. And there was some very noticable differences in performance. Now I can't be 100% sure on this since we don't know your full configurations. Brad Dameron SeaTab Software www.seatab.com