Hello, Any gurus out there who might be able to help me with this strange I/O behavior? Many thanks in advance, James -----Original Message----- From: James D. Parra [mailto:Jamesp@MusicReports.com] Sent: Wednesday, November 19, 2008 2:15 PM To: OS-en Subject: RE: [opensuse] wait times excessive high {Suse 10.3 & 11} On Wednesday, 2008-11-19 at 10:43 -0800, Randall R Schulz wrote:
Why not, have you benchmarked it?
Me, no; he did. On one computer it took 50 minutes, 10 on another identical system. That difference is not normal.
But if he has a slow disk, it is worth investigating why, because the machine seems to be a good one. ~~~~~~~~~~~~~~~~~~~~ Hello, Thank you for all of your feed back. I have three machines that I will compare. Machines 1 & 2 have no problems. Machine 3, however, is slow whether it is writing to a single disk or a RAID50 with 24 drives. File System on all machines is XFS. Machine 3 has a fresh OS install on new equipment. ***Machine 1*** -- 8 cpu's -- test done on single drive. -- No problems here. model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz stepping : 11 cpu MHz : 1600.049 cache size : 4096 KB Using 'cp' to copy a 17GB file over the network. top - 11:52:47 up 20 days, 3:53, 2 users, load average: 1.42, 0.89, 0.38 Tasks: 176 total, 1 running, 175 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 1.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 1.0%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.3%hi, 11.0%si, 0.0%st Cpu3 : 0.0%us, 8.0%sy, 0.0%ni, 13.7%id, 78.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.3%us, 0.7%sy, 0.0%ni, 98.0%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st Cpu6 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.7%sy, 0.0%ni, 98.0%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st Mem: 16460944k total, 16377168k used, 83776k free, 52k buffers Swap: 4200988k total, 100k used, 4200888k free, 15902292k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11682 root 18 0 9132 748 596 D 8 0.0 0:28.50 cp 11367 root 15 0 0 0 0 S 2 0.0 0:02.98 pdflush 255 root 10 -5 0 0 0 S 1 0.0 2:08.64 kswapd0 4256 root 10 -5 0 0 0 S 1 0.0 0:13.39 rpciod/2 661 root 10 -5 0 0 0 S 0 0.0 0:00.98 xfsdatad/5 6424 postgres 16 0 18532 2392 416 S 0 0.0 10:53.34 postgres 11724 root 15 0 8632 1180 828 R 0 0.0 0:00.04 top 1 root 15 0 812 304 244 S 0 0.0 0:02.80 init 2 root 11 -5 0 0 0 S 0 0.0 0:00.25 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 4 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/1 6 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1 7 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/2 8 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 9 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/3 10 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3 11 root RT -5 0 0 0 S 0 0.0 0:00.30 migration/4 12 root 34 19 0 0 0 S 0 0.0 0:00.06 ksoftirqd/4 <snip> time cp /home/backup/server/17GB_file /tmp/ real 7m41.027s user 0m0.160s sys 0m41.119s <snip> Trying 'dd'; dd if=/tmp/17GB_file of=/tmp/17GB_file2 bs=1M top - 12:27:11 up 20 days, 4:27, 2 users, load average: 0.66, 0.19, 0.22 Tasks: 176 total, 1 running, 175 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 1.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 1.7%sy, 0.0%ni, 36.3%id, 60.3%wa, 0.0%hi, 1.7%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 3.0%sy, 0.0%ni, 65.7%id, 29.0%wa, 1.3%hi, 1.0%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16460944k total, 16374520k used, 86424k free, 52k buffers Swap: 4200988k total, 100k used, 4200888k free, 15903112k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11884 root 18 0 10244 1792 644 D 4 0.0 0:02.81 dd 255 root 10 -5 0 0 0 S 1 0.0 2:18.29 kswapd0 11367 root 15 0 0 0 0 S 1 0.0 0:05.36 pdflush 660 root 10 -5 0 0 0 S 0 0.0 1:03.30 xfsdatad/4 662 root 10 -5 0 0 0 S 0 0.0 0:59.59 xfsdatad/6 11724 root 15 0 8632 1184 828 R 0 0.0 0:04.79 top 1 root 15 0 812 304 244 S 0 0.0 0:02.80 init 2 root 11 -5 0 0 0 S 0 0.0 0:00.25 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 4 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/1 6 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1 7 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/2 8 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 9 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/3 10 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3 11 root RT -5 0 0 0 S 0 0.0 0:00.30 migration/4 12 root 34 19 0 0 0 S 0 0.0 0:00.06 ksoftirqd/4 13 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/5 14 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/5 <snip> Notice that 'cp' and 'dd' are on the top of the list. ***Machine 2*** -- 8 cpu's -- test done on mirrored scsi drive -- No problems here. model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping : 6 cpu MHz : 2000.137 Using 'cp' to copy a 17GB file over the network. (this is also in production) top - 12:37:15 up 22 days, 5:03, 2 users, load average: 1.05, 0.32, 0.16 Tasks: 175 total, 2 running, 173 sleeping, 0 stopped, 0 zombie Cpu0 : 99.3%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 99.0%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 3.6%sy, 0.0%ni, 68.5%id, 26.5%wa, 0.7%hi, 0.7%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 5.3%sy, 0.0%ni, 0.0%id, 94.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 2.7%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.7%hi, 15.6%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 33012372k total, 32851424k used, 160948k free, 12k buffers Swap: 10490436k total, 364k used, 10490072k free, 32062816k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10731 postgres 25 0 657m 548m 227m R 100 1.7 4:33.18 postgres 18818 root 18 0 9136 752 596 D 6 0.0 0:03.76 cp 4634 root 10 -5 0 0 0 S 3 0.0 4:57.23 rpciod/6 257 root 10 -5 0 0 0 S 2 0.0 12:18.33 kswapd0 30402 root 15 0 0 0 0 S 1 0.0 0:00.23 pdflush 18875 root 15 0 8628 1184 828 R 0 0.0 0:00.04 top 1 root 18 0 808 304 244 S 0 0.0 0:02.99 init 2 root 12 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/0 4 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/1 6 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1 7 root RT -5 0 0 0 S 0 0.0 0:00.10 migration/2 8 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 9 root RT -5 0 0 0 S 0 0.0 0:00.04 migration/3 10 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3 11 root RT -5 0 0 0 S 0 0.0 0:04.15 migration/4 12 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/4 13 root RT -5 0 0 0 S 0 0.0 0:03.98 migration/5 14 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/5 15 root RT -5 0 0 0 S 0 0.0 0:00.28 migration/6 16 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/6 17 root RT -5 0 0 0 S 0 0.0 0:00.14 migration/7 18 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/7 19 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/0 20 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/1 21 root 10 -5 0 0 0 S 0 0.0 0:00.06 events/2 22 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/3 23 root 10 -5 0 0 0 S 0 0.0 0:00.34 events/4 <snip> time cp /home/backup/server/17GB_file /tmp/ real 7m19.508s user 0m0.252s sys 0m29.618s <snip> Output from iostat; avg-cpu: %user %nice %system %iowait %steal %idle 0.03 0.00 0.62 11.51 0.00 87.84 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.25 0.00 1011.00 0.00 127730.75 126.34 128.49 127.16 0.99 100.20 <snip> As you can see, this is pretty fast. ***Machine 3*** -- 8 cpu's -- test done on mirrored scsi drive -- I/O falls apart on this machine. model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping : 6 cpu MHz : 2000.076 Using 'cp' to copy a 17GB file over the network. top - 13:01:11 up 2:08, 3 users, load average: 4.77, 1.27, 0.58 Tasks: 162 total, 1 running, 161 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 1.1%sy, 0.0%ni, 53.2%id, 45.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.6%sy, 0.0%ni, 0.0%id, 99.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 33020488k total, 7822156k used, 25198332k free, 316k buffers Swap: 8393952k total, 0k used, 8393952k free, 7587420k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21 root 15 -5 0 0 0 S 1 0.0 0:15.42 events/2 4203 root 20 0 8296 724 584 D 1 0.0 0:03.62 cp 1 root 20 0 864 328 272 S 0 0.0 0:01.28 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 4 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/1 6 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1 7 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/2 8 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 9 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/3 10 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3 11 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/4 12 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/4 13 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/5 14 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/5 15 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/6 16 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/6 17 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/7 18 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksoftirqd/7 19 root 15 -5 0 0 0 S 0 0.0 0:00.20 events/0 20 root 15 -5 0 0 0 S 0 0.0 0:15.18 events/1 22 root 15 -5 0 0 0 S 0 0.0 0:00.08 events/3 23 root 15 -5 0 0 0 S 0 0.0 0:00.02 events/4 24 root 15 -5 0 0 0 S 0 0.0 0:00.02 events/5 25 root 15 -5 0 0 0 S 0 0.0 0:00.04 events/6 26 root 15 -5 0 0 0 S 0 0.0 0:00.10 events/7 27 root 15 -5 0 0 0 S 0 0.0 0:00.00 khelper <snip> time cp /home/backup/server/17GB_file /tmp/ real 36m43.241s user 0m0.120s sys 0m33.954s Here is the output of vmstat; procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 3 52 8189540 120 24411432 0 0 1 277 29 36 0 0 87 12 0 3 52 8189276 120 24411512 0 0 0 6436 108 270 0 0 76 24 0 4 52 8185060 120 24415604 0 0 0 6128 103 170 0 0 51 49 0 6 52 8176620 120 24423796 0 0 0 6208 110 327 0 0 83 16 0 9 52 8163972 120 24436084 0 0 0 6296 113 386 0 0 68 32 0 9 52 8159756 120 24440180 0 0 0 6088 102 387 0 0 68 32 0 9 52 8159764 120 24440180 0 0 0 6120 107 336 0 0 40 60 0 9 52 8147240 120 24452468 0 0 0 6504 105 420 0 0 72 27 0 9 52 8138808 120 24460656 0 0 0 5835 95 359 0 0 55 44 0 1 52 8138808 120 24460656 0 0 0 4100 128 331 0 0 69 31 0 5 52 8126340 120 24472948 0 0 0 8219 95 130 0 0 76 23 0 5 52 8122072 120 24476868 0 0 0 6064 90 134 0 0 80 20 0 4 52 8113892 120 24485324 0 0 0 6245 130 331 0 0 79 21 0 6 52 8109428 120 24489328 0 0 0 6856 113 285 0 0 75 24 0 8 52 8101252 120 24497644 0 0 0 6088 103 276 0 0 77 23 0 8 52 8092820 120 24505836 0 0 0 6096 103 375 0 0 79 21 0 8 52 8088476 120 24509668 0 0 0 6088 115 369 0 0 69 31 0 8 52 8080132 120 24517936 0 0 0 6088 109 359 0 0 77 23 0 8 52 8075832 120 24522148 0 0 4 8136 112 413 0 0 35 64 0 9 52 8067400 120 24530344 0 0 0 5975 110 418 0 0 73 27 <snip> and from iostat; avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.19 31.95 0.00 67.86 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 677.25 0.00 98.50 0.00 12496.00 126.86 142.08 1441.98 10.15 100.00 <snip> Something is causing a significant await times (1441.98 compared to 127.16 on Machine 2). Any thoughts on discovering what it might be? Thank you, James -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org