[Bug 879071] New: Stalls during dd on an external USB disk
https://bugzilla.novell.com/show_bug.cgi?id=879071 https://bugzilla.novell.com/show_bug.cgi?id=879071#c0 Summary: Stalls during dd on an external USB disk Classification: openSUSE Product: openSUSE Factory Version: 13.2 Milestone 0 Platform: x86-64 OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: mhocko@suse.com ReportedBy: jslaby@suse.com QAContact: qa-bugs@suse.de Found By: --- Blocker: --- Created an attachment (id=591354) --> (http://bugzilla.novell.com/attachment.cgi?id=591354) /proc/meminfo and /proc/vmstat every-second snapshots I am using Tumbleweed (i.e. 13.1 + some newer packages). The kernel is 3.14.2-25.g1474ea5-desktop. When I run: $ dd if=/dev/zero of=bubak bs=$((4*1024*1024)) on a mounted USB partition over USB 2.0, the stalls begin. The disk is this (smartclt says): Model Family: Hitachi/HGST Travelstar Z7K500 Device Model: HITACHI HTS725050A7E630 Serial Number: TF1500Y9H2N95B LU WWN Device Id: 5 000cca 662cf4c6f Firmware Version: GH2ZB390 User Capacity: 500 107 862 016 bytes [500 GB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Mounted as: type ext4 (rw,nosuid,nodev,relatime,data=ordered,uhelper=udisks2) Both noop and cfq schedulers do this. So I did (in /dev/shm/vm): while :; do TIME=`date +%s.%N` echo $TIME cat /proc/meminfo >meminfo-$TIME cat /proc/vmstat >vmstat-$TIME sleep 1 done and see: 1400672643.810787699 1400672644.817812969 1400672645.822746359 1400672646.834695436 1400672647.843612029 1400672648.853972001 1400672649.862598851 1400672650.871373444 1400672652.182838629 <-- stalled 1400672653.192253464 1400672654.200850674 1400672655.212085244 1400672656.221593476 1400672657.230324099 1400672659.141308093 <-- stalled 1400672660.149899029 1400672662.259573479 <-- stalled 1400672663.269444609 1400672664.279327742 1400672665.287736541 1400672666.295795513 1400672667.305839304 1400672668.314697122 1400672669.325072842 1400672670.334228258 1400672671.344690614 1400672672.353744075 1400672673.361565807 1400672674.371823780 1400672675.683192247 1400672676.693375982 1400672677.702441380 1400672678.712842427 1400672679.721433233 1400672680.729992233 1400672681.739432910 1400672682.745694320 1400672683.753713095 Then I interrupted dd and this loop. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c1
Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c2
--- Comment #2 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c3
Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c4
--- Comment #4 from Jiri Slaby
So this looks like the old and known problem that dirty_{ratio,bytes} is too high for the slow device :/
Oh, thanks. Since the stalls were unbearable, I set the limits a long time ago and it helped significantly: # cat /proc/sys/vm/dirty_bytes 209715200 # cat /proc/sys/vm/dirty_ratio 0 My sysctl.conf says: vm.dirty_writeback_centisecs = 3000 vm.laptop_mode = 5 vm.dirty_bytes = 209715200 But as you can see, I can still see the stalls under some circumstances :(. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c5
--- Comment #5 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c6
--- Comment #6 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c7
--- Comment #7 from Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c8
--- Comment #8 from Michal Hocko
Ok, the output is now like this:
time 1401181527: 65 (kswapd0) Stalled: 1192 ms: shrink_zone [...] time 1401181531: 65 (kswapd0) Stalled: 1108 ms: shrink_zone [...]
These two are in the background reclaim so no stalls on other allocators. But the last one is a userspace path which ends up in the direct reclaim and stalls on the congestion wait. So this is what I expected in comment 6
time 1401181531: 1558 (psi-plus) Stalled: 1400 ms: shrink_zone Guessing: IO_WritebackInProgress -nr_dirty 21512 -nr_writeback 3955 -nr_vmscan_write 245878 - /sys/block/sda/stat 1729415 57441 58854989 2074861 2228007 550710 79380137 42479068 0 1452328 44568637 - /sys/block/sdb/stat 3768 1729 369338 150446 120793 767 28520472 46288447 128 389560 46467531 +nr_dirty 18482 -3030 +nr_writeback 2707 -1248 +nr_vmscan_write 245878 0 + /sys/block/sda/stat 1729483 57444 58859549 2074899 2228035 550712 79381705 42479239 0 1452357 44568845 + /sys/block/sdb/stat 3768 1729 369338 150446 120932 772 28553480 46344916 91 389959 46520862 [<ffffffff8116100e>] congestion_wait+0x6e/0x110 [<ffffffff81155bca>] shrink_inactive_list+0x4aa/0x4f0 [<ffffffff81156261>] shrink_lruvec+0x2f1/0x610 [<ffffffff811565e6>] shrink_zone+0x66/0x190 [<ffffffff8160aea5>] kretprobe_trampoline+0x0/0x4b [<ffff88011e5e7b08>] 0xffff88011e5e7b08 [<ffffffff8115726a>] try_to_free_pages+0xda/0x1c0 [<ffffffff8160aea5>] kretprobe_trampoline+0x0/0x4b [<ffffffff81055920>] copy_process.part.24+0x130/0x1ba0 [<ffffffff81194497>] kmem_cache_alloc+0x207/0x460 [<ffffffff811b10ae>] alloc_file+0x1e/0xc0 [<ffffffff81057551>] do_fork+0xd1/0x300 [<ffffffff81610b09>] stub_clone+0x69/0x90 [<ffffffff816107ad>] system_call_fastpath+0x1a/0x1f [<ffffffffffffffff>] 0xffffffffffffffff
Mel, you have said that you had a patch which should reduce congestion waits from direct reclaim, right? That being said, the primary problem is still that we shouldn't get into direct reclaim in the first place. The writer should get throttled before it dirties more memory than we can write back in the reasonable time. But a secondary problem is that direct reclaim might be throttled even when not necessary. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c9
--- Comment #9 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c10
--- Comment #10 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c11
--- Comment #11 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c12
--- Comment #12 from Michal Hocko
Looking at the memory stats the machine seems relatively memory constrained - it has 4 GB of ram. Out of that 3 GB are allocated for anonymous memory, another 500 MB in shmem pages, and 140 MB for slab & page tables. So that leaves relatively tight (~350 MB) maneuvering space for page cache, free space reserves, etc. So I'm not surprised reclaim sees the whole reclaim batch of dirty pages in the LRU. I'm not enough of a reclaim expert to judge where exactly is a problem.
This is definitely a good observation Jan! I was so focused on the reclaim statistics that I've completely missed the whole picture. meminfo-1400672652.182838629: SwapCached: 22384 kB Active(anon): 2578936 kB Inactive(anon): 769128 kB Active(file): 77848 kB Inactive(file): 137296 kB SwapTotal: 1952764 kB SwapFree: 594984 kB Dirty: 83996 kB So the swap is full from 70%. Inactive anonymous LRU is quite low but not enough to trigger active anon aging (with ~4G the ratio is 5-6). Big chunk of the file LRU is dirty so it seems that you are really right and the stalls are a result of the memory pressure. Now the question is, does it happen only under such situations? Another obvious question is. Can we do better and are such stalls so unexpected to happen considering the load? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c13
--- Comment #13 from Michal Hocko
One more thing. Although there might be some throttling on the writer side I guess we are not doing it aggressively enough.
Here is the nr_dirtied vs nr_written comparison (nr_dirtied and nr_written are per delay. diff = nr_dirtied - nr_written and total_diff is cumulative diff) delay nr_dirtied nr_written diff total_diff 1.01195 34697 7000 27697 27701 1.00892 2936 10145 -7209 20492 1.01036 13107 10057 3050 23542 1.00863 14373 9495 4878 28420 1.00877 6894 10284 -3390 25030 1.31147 11541 13539 -1998 23032 1.00941 14969 9916 5053 28085 1.0086 4763 10430 -5667 22418 1.01123 15491 10169 5322 27740 1.00951 11298 10215 1083 28823 1.00873 3703 10168 -6465 22358 1.91098 12459 19335 -6876 15482 1.00859 19852 10229 9623 25105 2.10967 18650 21341 -2691 22414 1.00987 7760 10243 -2483 19931
we can clearly see that dirtier is ~24k pages ahead of writeback on average which is ~90M so around 2s worth of writeback. Correct. And that is really working as designed - we have dirty_bytes set to 200 MB so writeback is free to keep upto 200 MB of dirty pages if it sees that fit. We have to keep non-trivial amount of dirty pages in order to be able to
Btw. I do not see dirty throttling checking global_dirtyable_memory if dirty_bytes is used. Is this OK? I am not sure. The idea has been that when dirty_bytes is set at X, then you know you can have X amount of dirty memory. However when global_dirtyable_memory() is less than say 2*dirty_bytes, it gets really strange and maybe we should take that into account. I guess you can try
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c14
--- Comment #14 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c15
--- Comment #15 from Michal Hocko
we can clearly see that dirtier is ~24k pages ahead of writeback on average which is ~90M so around 2s worth of writeback. Correct. And that is really working as designed - we have dirty_bytes set to 200 MB so writeback is free to keep upto 200 MB of dirty pages if it sees that fit. We have to keep non-trivial amount of dirty pages in order to be able to
(In reply to comment #13) [...] form large enough IOs and keep hardware busy. The experiments have shown that you need at least a second or so worth of writeback cached to level out bumps in disk performance, IO completion bursts etc.
Thanks for the clarification.
Btw. I do not see dirty throttling checking global_dirtyable_memory if dirty_bytes is used. Is this OK? I am not sure. The idea has been that when dirty_bytes is set at X, then you know you can have X amount of dirty memory.
OK, I suspected something like this. But I am afraid that more we will encourage users to use dirty_bytes because dirty_ratio sucks with a lot of memory the more strange issues we will see.
However when global_dirtyable_memory() is less than say 2*dirty_bytes, it gets really strange and maybe we should take that into account. I guess you can try proposing something like that upstream (or I can :).
I guess this can be double checked by setting dirty_ratio to 4% to have the similar setting except that global_dirtyable_memory is not ignored. Could you give it a try Jiri? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c16
--- Comment #16 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c17
--- Comment #17 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c18
--- Comment #18 from Jiri Slaby
any news here?
Good news actually. With the kswapd patch above, I am running 3.15.5 under high memory pressure and no stalls on dd. Brilliant! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c19
--- Comment #19 from Michal Hocko
(In reply to comment #17)
any news here?
Good news actually. With the kswapd patch above,
Do you mean patch from comment 10 or the one referenced from the upstream discussion in comment 16.
I am running 3.15.5 under high memory pressure and no stalls on dd. Brilliant!
In any case good to hear. We just have to find a way how to route the fix into code streams properly. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c20
--- Comment #20 from Jiri Slaby
(In reply to comment #18)
(In reply to comment #17)
any news here?
Good news actually. With the kswapd patch above,
Do you mean patch from comment 10 or the one referenced from the upstream discussion in comment 16.
For quite some time, I have been using the latter only. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c21
--- Comment #21 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c22
--- Comment #22 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c23
--- Comment #23 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c24
--- Comment #24 from Jiri Slaby
As it turned out SLE12 already has this patch. I have pushed it to openSUSE-13.1 other branches are not affected as they are older than 3.11 when this particular problem has been introduced (by e2be15f6c3ee mm: vmscan: stall page reclaim and writeback pages based on dirty/writepage pages encountered).
I have pushed it also to the stable branch: fdb2dde88465..edc5ddf28550 stable -> stable (In reply to comment #23)
Jiri, I think this is worth backporting to 3.12 stable tree. Should I post it to the stable ML
Yes, post it, please. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c25
Michal Hocko
(In reply to comment #23)
Jiri, I think this is worth backporting to 3.12 stable tree. Should I post it to the stable ML
Yes, post it, please.
done I guess we are done here -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=879071
https://bugzilla.novell.com/show_bug.cgi?id=879071#c26
--- Comment #26 from Swamp Workflow Management
http://bugzilla.novell.com/show_bug.cgi?id=879071
Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com