network/disk/system performance
I just got some surprising performance numbers. I have three systems, call them A, B and C. A is a file server, B is supposed to be a fast server and C is my desktop. I'm trying to improve the performance of a job that normally runs on B and manipulates files on A, so I wrote a little test program. I ran it on all three machines. My test creates 10000 symlinks to a file. Symlinks and the file are accessed via NFS from A. It was fastest on A (less than 1 second) - no surprise using local disk. It took 5 seconds on C. It took 23 seconds on B! Big surprise. Anybody have any thoughts on what makes B so slow? All are connected to each other through a gigabit switch. In addition, A and B have additional gigabit NICS connected directly to each other. Here are some system details: A: Tyan S5350 Tiger i7320 dual 2.8 GHz Xeon, 1 GB memory, lots of SATA disks with 3ware controllers. SUSE 9.2 Intel PRO/1000 MT Dual Port NIC - quiescent during tests Intel PRO/1000 MT Dual Port NIC - direct connect to B Broadcom NetXtreme BCM5721 Gigabit NIC PCI Express - unused Broadcom NetXtreme BCM5721 Gigabit NIC PCI Express - main network B: Tyan S2882 Thunder K8S pro, dual Opteron 242, 8 GB memory, fast SCSI local disks, SUSE 9.2 Intel EtherExpress PRO/100 S Server Adapter - unused. Broadcom NetXtreme BCM5704 Gigabit NIC - direct connect to A. Broadcom NetXtreme BCM5704 Gigabit NIC - main network. C: Intel D925 / P4-3600, 2 GB memory, SATA local disk, SUSE 9.3, Intel 82547EI Gigabit NIC - main network While the test is running, 'vmstat 1' on B shows lines like: procs -----------memory------------ -swap --io- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 30716 2616544 42620 4389496 0 0 0 0 5698 4616 0 5 95 0 while on C it shows: 1 0 23820 737800 322640 372736 0 0 0 0 9311 12258 1 7 93 0 So C is managing nearly twice as many interrupts and three times as many context switches per second as B. On quiet systems, nttcp shows very similar transmit (910 Mbps) rates from both B and C to A. The receive rate to C (930 Mbps) is a bit better than to B (840 Mbps). But I don't think there's anything wrong with the basic network. Any ideas on how to improve B's performance would be very welcome. Thanks, Dave
Dave Howorth wrote:
I just got some surprising performance numbers.
Some more facts and better numbers ... B packages: kernel 2.6.8-24.14 SMP x86-64, nfs-utils 1.0.6-113 C packages: kernel 2.6.11.4-21.7 SMP i586, nfs-utils 1.0.7-3
I have three systems, call them A, B and C.
I have another machine that is essentially a duplicate of B. It behaves the same, so I think I can rule out hardware faults.
A is a file server, B is supposed to be a fast server and C is my desktop. I'm trying to improve the performance of a job that normally runs on B and manipulates files on A, so I wrote a little test program. I ran it on all three machines. My test creates 10000 symlinks to a file. Symlinks and the file are accessed via NFS from A.
It was fastest on A (less than 1 second) - no surprise using local disk. It took 5 seconds on C. It took 23 seconds on B! Big surprise.
I re-mounted A's disks on B , via the main-network interface instead of the direct connect, and ran the test again: 7 seconds !!! Much better, but why?
Anybody have any thoughts on what makes B so slow?
So now it seems to be specific to an interface (they are identical interfaces). nfsstat didn't show any errors or retransmits etc.
All are connected to each other through a gigabit switch. In addition, A and B have additional gigabit NICS connected directly to each other.
Here are some system details:
A: Tyan S5350 Tiger i7320 dual 2.8 GHz Xeon, 1 GB memory, lots of SATA disks with 3ware controllers. SUSE 9.2 Intel PRO/1000 MT Dual Port NIC - quiescent during tests Intel PRO/1000 MT Dual Port NIC - direct connect to B Broadcom NetXtreme BCM5721 Gigabit NIC PCI Express - unused Broadcom NetXtreme BCM5721 Gigabit NIC PCI Express - main network
B: Tyan S2882 Thunder K8S pro, dual Opteron 242, 8 GB memory, fast SCSI local disks, SUSE 9.2 Intel EtherExpress PRO/100 S Server Adapter - unused. Broadcom NetXtreme BCM5704 Gigabit NIC - direct connect to A. Broadcom NetXtreme BCM5704 Gigabit NIC - main network.
C: Intel D925 / P4-3600, 2 GB memory, SATA local disk, SUSE 9.3, Intel 82547EI Gigabit NIC - main network
While the test is running, 'vmstat 1' on B shows lines like:
procs -----------memory------------ -swap --io- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 30716 2616544 42620 4389496 0 0 0 0 5698 4616 0 5 95 0
while on C it shows:
1 0 23820 737800 322640 372736 0 0 0 0 9311 12258 1 7 93 0
So C is managing nearly twice as many interrupts and three times as many context switches per second as B.
On quiet systems, nttcp shows very similar transmit (910 Mbps) rates from both B and C to A. The receive rate to C (930 Mbps) is a bit better than to B (840 Mbps). But I don't think there's anything wrong with the basic network.
Checking the above reveals that I measured B<->A using the main-network interface. Repeating the test with the direct connect interface gives a bit lower figures (810 Mbps transmit, 720 Mbps receive). Oh, the direct connect is eth0 and the main network is eth1.
Any ideas on how to improve B's performance would be very welcome.
So I now have a workaround for my application's performance, but it would be nice to understand why the direct connection is so much slower. Thanks, Dave
Dave Howorth wrote:
Checking the above reveals that I measured B<->A using the main-network interface. Repeating the test with the direct connect interface gives a bit lower figures (810 Mbps transmit, 720 Mbps receive).
Oh, the direct connect is eth0 and the main network is eth1.
Any ideas on how to improve B's performance would be very welcome. It appears you need to check the direct connections cable. That would appear to be the "weakest link". http://www.sql-server-performance.com/jc_gigabit.asp
-- Joe Morris Registered Linux user 231871
Joe Morris (NTM) wrote:
Dave Howorth wrote:
Checking the above reveals that I measured B<->A using the main-network interface. Repeating the test with the direct connect interface gives a bit lower figures (810 Mbps transmit, 720 Mbps receive).
Oh, the direct connect is eth0 and the main network is eth1.
Any ideas on how to improve B's performance would be very welcome. It appears you need to check the direct connections cable. That would appear to be the "weakest link". http://www.sql-server-performance.com/jc_gigabit.asp
Hi Joe, I'm not sure what I'm looking for there? The interfaces are running at gigabit rate and there are no errors logged, so the configuration appears to be OK, to me at least? And two similar systems with similar cables behave the same, which seems to eliminate specific hardware/cable faults? The cables are cat 6. Thanks, Dave
I'm not sure what I'm looking for there? According to the site I sent a link to, Cat5e or Cat 6, which you have. The interfaces are running at gigabit rate and there are no errors logged, so the configuration appears to be OK, to me at least? I thought this connection was quite a bit slower than the other interface connected to a switch? You are sure it is running at a gigabit rate? And two similar systems with similar cables behave the same, which seems to eliminate specific hardware/cable faults? Actually, it only says that everything is working as designed. According to the link I sent you, gigabit uses a regular patch cable for a machine to machine connection and not a crossover cable. If the cables are similar to 2 similar computers, and exhibit similar problems,
Dave Howorth wrote: then I would try to change a variable, if it is worth it to you to find the problem (since it works through the switch). Since it is easy to swap cables (which may cause some config problems), does it work as fast through the problem connection's NIC if connected to the switch? If it is just as fast, it must be the cable (is it routed near inductive noise?). Is it the same slowness if the direct connection cable is plugged into the NIC that was connected to the switch?
The cables are cat 6. That should be the best. These are just some thoughts. I may be way off target. HTH, but YMMV.
-- Joe Morris Registered Linux user 231871
participants (2)
-
Dave Howorth
-
Joe Morris (NTM)