[opensuse] Strange i/o difference between identical webservers
I have 3 webservers. They are symmetrically load-balanced in the DNS, so the HTTP requests and the numbers of requests per second are identical. They run apache (with worker MPM), and serve a lot of small static files (about 1KB ) which are in a big directory tree in ReiserFS. The total size of these files is 3GB. The apache is compiled identically, the configuration of apache is also identical. Networking/firewall configuration is identical, there's no other software on the machines. Hardware configuration is almost identical: Server 1: 2x3GHz 1MB L2 64bit Xeons, 4GB RAM, SATA Server 2: 2x3.06 512KB L2 32bit Xeons, 4GB RAM, Ultra320 SCSI Server 3: 2x2.66 512KB L2 32bit Xeons, 6GB RAM, Ultra320 SCSI Server 1: SuSE 9.3 (x86-64) 2.6.11.4-21.12-smp Server 2: openSUSE 11.0 (i586) 2.6.25.5-1.1-pae Server 3: openSUSE 10.3 (i586) 2.6.22.17-0.1-bigsmp There are no hardware problems that I know of (nothing can be seen in /var/log/messages, nor in ipmitool System Event Log) The problem is with Server 3: it's load average is 4-5x of the other two. The problem can be traced to much higher I/O disk reads: Sever 3 (bad): web14:~ # vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 48 1070788 77944 4263400 0 0 1 1 0 0 3 7 80 9 0 0 48 1070632 78096 4263444 0 0 112 1852 3163 3940 1 3 66 31 0 1 48 1070260 78252 4263436 0 0 148 0 2914 2800 2 3 92 3 0 0 48 1070036 78452 4263380 0 0 200 0 2812 2767 2 3 89 6 0 0 48 1070036 78596 4263436 0 0 144 0 2907 2821 1 1 94 3 0 1 48 1069912 78864 4263372 0 0 268 0 3959 3970 3 4 86 7 0 0 48 1070832 77300 4263408 0 0 168 1832 4759 4655 3 5 59 33 0 0 48 1071044 77404 4263456 0 0 104 0 3051 2950 2 2 93 3 Servers 1 and 2 (good): web16:~ # vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 2530608 600208 706692 0 0 3 8 3 51 2 3 93 3 0 0 0 2530576 600208 706692 0 0 0 0 2285 3995 2 3 96 0 0 0 0 2530600 600208 706692 0 0 0 0 2150 3513 1 2 97 0 0 0 0 2530624 600208 706692 0 0 0 1772 2056 3555 1 2 95 2 0 0 0 2530600 600208 706692 0 0 0 0 2085 3319 1 3 96 0 0 0 0 2530700 600208 706692 0 0 0 0 2342 3900 2 2 96 0 0 0 0 2530600 600216 706692 0 0 8 0 1928 3374 1 2 97 0 0 0 0 2530600 600220 706700 0 0 4 0 1982 2943 1 2 96 0 "bi" column shows number of blocks per second read from the block device. It is very small for good servers, and about 100x larger for the problematic server. Note that all servers have plenty of free RAM, so caching files by OS *should not* be a problem. Same thing is shown by sar -d %util is very small for good servers, and about 30-70% for the overloaded server. Could someone shed any light on the possible source of this mysterious disparity? Thanks Alec -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (1)
-
Alec Matusis