RE: [opensuse] wait times excessive high {Suse 10.3 & 11}
Hello, Any gurus out there who might be able to help me with this strange I/O behavior? Many thanks in advance, James -----Original Message----- From: James D. Parra [mailto:Jamesp@MusicReports.com] Sent: Wednesday, November 19, 2008 2:15 PM To: OS-en Subject: RE: [opensuse] wait times excessive high {Suse 10.3 & 11} On Wednesday, 2008-11-19 at 10:43 -0800, Randall R Schulz wrote:
Why not, have you benchmarked it?
Me, no; he did. On one computer it took 50 minutes, 10 on another identical system. That difference is not normal.
But if he has a slow disk, it is worth investigating why, because the machine seems to be a good one. ~~~~~~~~~~~~~~~~~~~~ Hello, Thank you for all of your feed back. I have three machines that I will compare. Machines 1 & 2 have no problems. Machine 3, however, is slow whether it is writing to a single disk or a RAID50 with 24 drives. File System on all machines is XFS. Machine 3 has a fresh OS install on new equipment. ***Machine 1*** -- 8 cpu's -- test done on single drive. -- No problems here. model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz stepping : 11 cpu MHz : 1600.049 cache size : 4096 KB Using 'cp' to copy a 17GB file over the network. top - 11:52:47 up 20 days, 3:53, 2 users, load average: 1.42, 0.89, 0.38 Tasks: 176 total, 1 running, 175 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 1.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 1.0%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.3%hi, 11.0%si, 0.0%st Cpu3 : 0.0%us, 8.0%sy, 0.0%ni, 13.7%id, 78.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.3%us, 0.7%sy, 0.0%ni, 98.0%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st Cpu6 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.7%sy, 0.0%ni, 98.0%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st Mem: 16460944k total, 16377168k used, 83776k free, 52k buffers Swap: 4200988k total, 100k used, 4200888k free, 15902292k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11682 root 18 0 9132 748 596 D 8 0.0 0:28.50 cp 11367 root 15 0 0 0 0 S 2 0.0 0:02.98 pdflush 255 root 10 -5 0 0 0 S 1 0.0 2:08.64 kswapd0 4256 root 10 -5 0 0 0 S 1 0.0 0:13.39 rpciod/2 661 root 10 -5 0 0 0 S 0 0.0 0:00.98 xfsdatad/5 6424 postgres 16 0 18532 2392 416 S 0 0.0 10:53.34 postgres 11724 root 15 0 8632 1180 828 R 0 0.0 0:00.04 top 1 root 15 0 812 304 244 S 0 0.0 0:02.80 init 2 root 11 -5 0 0 0 S 0 0.0 0:00.25 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 4 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/1 6 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1 7 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/2 8 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 9 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/3 10 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3 11 root RT -5 0 0 0 S 0 0.0 0:00.30 migration/4 12 root 34 19 0 0 0 S 0 0.0 0:00.06 ksoftirqd/4 <snip> time cp /home/backup/server/17GB_file /tmp/ real 7m41.027s user 0m0.160s sys 0m41.119s <snip> Trying 'dd'; dd if=/tmp/17GB_file of=/tmp/17GB_file2 bs=1M top - 12:27:11 up 20 days, 4:27, 2 users, load average: 0.66, 0.19, 0.22 Tasks: 176 total, 1 running, 175 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 1.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 1.7%sy, 0.0%ni, 36.3%id, 60.3%wa, 0.0%hi, 1.7%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 3.0%sy, 0.0%ni, 65.7%id, 29.0%wa, 1.3%hi, 1.0%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16460944k total, 16374520k used, 86424k free, 52k buffers Swap: 4200988k total, 100k used, 4200888k free, 15903112k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11884 root 18 0 10244 1792 644 D 4 0.0 0:02.81 dd 255 root 10 -5 0 0 0 S 1 0.0 2:18.29 kswapd0 11367 root 15 0 0 0 0 S 1 0.0 0:05.36 pdflush 660 root 10 -5 0 0 0 S 0 0.0 1:03.30 xfsdatad/4 662 root 10 -5 0 0 0 S 0 0.0 0:59.59 xfsdatad/6 11724 root 15 0 8632 1184 828 R 0 0.0 0:04.79 top 1 root 15 0 812 304 244 S 0 0.0 0:02.80 init 2 root 11 -5 0 0 0 S 0 0.0 0:00.25 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 4 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/1 6 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1 7 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/2 8 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 9 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/3 10 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3 11 root RT -5 0 0 0 S 0 0.0 0:00.30 migration/4 12 root 34 19 0 0 0 S 0 0.0 0:00.06 ksoftirqd/4 13 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/5 14 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/5 <snip> Notice that 'cp' and 'dd' are on the top of the list. ***Machine 2*** -- 8 cpu's -- test done on mirrored scsi drive -- No problems here. model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping : 6 cpu MHz : 2000.137 Using 'cp' to copy a 17GB file over the network. (this is also in production) top - 12:37:15 up 22 days, 5:03, 2 users, load average: 1.05, 0.32, 0.16 Tasks: 175 total, 2 running, 173 sleeping, 0 stopped, 0 zombie Cpu0 : 99.3%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 99.0%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 3.6%sy, 0.0%ni, 68.5%id, 26.5%wa, 0.7%hi, 0.7%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 5.3%sy, 0.0%ni, 0.0%id, 94.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 2.7%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.7%hi, 15.6%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 33012372k total, 32851424k used, 160948k free, 12k buffers Swap: 10490436k total, 364k used, 10490072k free, 32062816k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10731 postgres 25 0 657m 548m 227m R 100 1.7 4:33.18 postgres 18818 root 18 0 9136 752 596 D 6 0.0 0:03.76 cp 4634 root 10 -5 0 0 0 S 3 0.0 4:57.23 rpciod/6 257 root 10 -5 0 0 0 S 2 0.0 12:18.33 kswapd0 30402 root 15 0 0 0 0 S 1 0.0 0:00.23 pdflush 18875 root 15 0 8628 1184 828 R 0 0.0 0:00.04 top 1 root 18 0 808 304 244 S 0 0.0 0:02.99 init 2 root 12 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/0 4 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/1 6 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1 7 root RT -5 0 0 0 S 0 0.0 0:00.10 migration/2 8 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 9 root RT -5 0 0 0 S 0 0.0 0:00.04 migration/3 10 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3 11 root RT -5 0 0 0 S 0 0.0 0:04.15 migration/4 12 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/4 13 root RT -5 0 0 0 S 0 0.0 0:03.98 migration/5 14 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/5 15 root RT -5 0 0 0 S 0 0.0 0:00.28 migration/6 16 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/6 17 root RT -5 0 0 0 S 0 0.0 0:00.14 migration/7 18 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/7 19 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/0 20 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/1 21 root 10 -5 0 0 0 S 0 0.0 0:00.06 events/2 22 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/3 23 root 10 -5 0 0 0 S 0 0.0 0:00.34 events/4 <snip> time cp /home/backup/server/17GB_file /tmp/ real 7m19.508s user 0m0.252s sys 0m29.618s <snip> Output from iostat; avg-cpu: %user %nice %system %iowait %steal %idle 0.03 0.00 0.62 11.51 0.00 87.84 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.25 0.00 1011.00 0.00 127730.75 126.34 128.49 127.16 0.99 100.20 <snip> As you can see, this is pretty fast. ***Machine 3*** -- 8 cpu's -- test done on mirrored scsi drive -- I/O falls apart on this machine. model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping : 6 cpu MHz : 2000.076 Using 'cp' to copy a 17GB file over the network. top - 13:01:11 up 2:08, 3 users, load average: 4.77, 1.27, 0.58 Tasks: 162 total, 1 running, 161 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 1.1%sy, 0.0%ni, 53.2%id, 45.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.6%sy, 0.0%ni, 0.0%id, 99.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 33020488k total, 7822156k used, 25198332k free, 316k buffers Swap: 8393952k total, 0k used, 8393952k free, 7587420k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21 root 15 -5 0 0 0 S 1 0.0 0:15.42 events/2 4203 root 20 0 8296 724 584 D 1 0.0 0:03.62 cp 1 root 20 0 864 328 272 S 0 0.0 0:01.28 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 4 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/1 6 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1 7 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/2 8 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 9 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/3 10 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3 11 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/4 12 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/4 13 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/5 14 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/5 15 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/6 16 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/6 17 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/7 18 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/7 19 root 15 -5 0 0 0 S 0 0.0 0:00.20 events/0 20 root 15 -5 0 0 0 S 0 0.0 0:15.18 events/1 22 root 15 -5 0 0 0 S 0 0.0 0:00.08 events/3 23 root 15 -5 0 0 0 S 0 0.0 0:00.02 events/4 24 root 15 -5 0 0 0 S 0 0.0 0:00.02 events/5 25 root 15 -5 0 0 0 S 0 0.0 0:00.04 events/6 26 root 15 -5 0 0 0 S 0 0.0 0:00.10 events/7 27 root 15 -5 0 0 0 S 0 0.0 0:00.00 khelper <snip> time cp /home/backup/server/17GB_file /tmp/ real 36m43.241s user 0m0.120s sys 0m33.954s Here is the output of vmstat; procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 3 52 8189540 120 24411432 0 0 1 277 29 36 0 0 87 12 0 3 52 8189276 120 24411512 0 0 0 6436 108 270 0 0 76 24 0 4 52 8185060 120 24415604 0 0 0 6128 103 170 0 0 51 49 0 6 52 8176620 120 24423796 0 0 0 6208 110 327 0 0 83 16 0 9 52 8163972 120 24436084 0 0 0 6296 113 386 0 0 68 32 0 9 52 8159756 120 24440180 0 0 0 6088 102 387 0 0 68 32 0 9 52 8159764 120 24440180 0 0 0 6120 107 336 0 0 40 60 0 9 52 8147240 120 24452468 0 0 0 6504 105 420 0 0 72 27 0 9 52 8138808 120 24460656 0 0 0 5835 95 359 0 0 55 44 0 1 52 8138808 120 24460656 0 0 0 4100 128 331 0 0 69 31 0 5 52 8126340 120 24472948 0 0 0 8219 95 130 0 0 76 23 0 5 52 8122072 120 24476868 0 0 0 6064 90 134 0 0 80 20 0 4 52 8113892 120 24485324 0 0 0 6245 130 331 0 0 79 21 0 6 52 8109428 120 24489328 0 0 0 6856 113 285 0 0 75 24 0 8 52 8101252 120 24497644 0 0 0 6088 103 276 0 0 77 23 0 8 52 8092820 120 24505836 0 0 0 6096 103 375 0 0 79 21 0 8 52 8088476 120 24509668 0 0 0 6088 115 369 0 0 69 31 0 8 52 8080132 120 24517936 0 0 0 6088 109 359 0 0 77 23 0 8 52 8075832 120 24522148 0 0 4 8136 112 413 0 0 35 64 0 9 52 8067400 120 24530344 0 0 0 5975 110 418 0 0 73 27 <snip> and from iostat; avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.19 31.95 0.00 67.86 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 677.25 0.00 98.50 0.00 12496.00 126.86 142.08 1441.98 10.15 100.00 <snip> Something is causing a significant await times (1441.98 compared to 127.16 on Machine 2). Any thoughts on discovering what it might be? Thank you, James -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
James D. Parra wrote:
Hello,
Any gurus out there who might be able to help me with this strange I/O behavior?
You can only check the differences between the machines. It should be something on the machine itself. On possible culprit is the driver module that is controlling the disks. I had a lot of problem with Adaptec controllers some time ago. Check the versions of the mainboard bios, controller bios, driver modules and their versions and even the firmware versions of the drives themselves. If that doesn't turn up a possible culprit I am out of ideas. -- Sandy List replies only please! Please address PMs to: news-reply2 (@) japantest (.) homelinux (.) com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (2)
-
James D. Parra
-
Sandy Drobic