[Bug 1109908] New: regression: extreme page allocation stalls since upgrading from SLES12 / 42.3 kernel
http://bugzilla.suse.com/show_bug.cgi?id=1109908 Bug ID: 1109908 Summary: regression: extreme page allocation stalls since upgrading from SLES12 / 42.3 kernel Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.0 Hardware: All OS: Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: s.priebe@profihost.ag QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Hello, since uprading from Kernel 4.4 to 4.12 i'm seeing memory stalls every few days. All kvm machines do not respond in that timeframe. Exmaple output: [ 5662.847520] kvm: page allocation stalls for 142040ms, order:9, mode:0x4340ca(__GFP_HIGHMEM|__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_MOVABLE|__GFP_DIRECT_RECLAIM), nodemask=(null) [ 5662.849196] kvm cpuset=/ mems_allowed=0-1 [ 5662.850004] CPU: 41 PID: 37414 Comm: kvm Tainted: G W 4.12.0+113-ph #1 SLE15 (unreleased) [ 5662.850857] Hardware name: Supermicro SYS-1028U-TRT+/X10DRU-i+, BIOS 2.0b 08/09/2016 [ 5662.851734] Call Trace: [ 5662.852598] dump_stack+0x5c/0x84 [ 5662.853448] warn_alloc+0xe0/0x180 [ 5662.854289] __alloc_pages_slowpath+0x820/0xc90 [ 5662.855074] __alloc_pages_nodemask+0x1cc/0x210 [ 5662.855794] alloc_pages_vma+0x1ff/0x2a0 [ 5662.856590] do_huge_pmd_wp_page+0x83f/0xf00 [ 5662.857484] ? set_huge_zero_page.isra.52.part.53+0x9b/0xb0 [ 5662.858253] ? do_huge_pmd_anonymous_page+0x62e/0x6c0 [ 5662.859015] __handle_mm_fault+0x93d/0x1060 [ 5662.859785] handle_mm_fault+0xc6/0x1b0 [ 5662.860522] __do_page_fault+0x230/0x430 [ 5662.861246] ? get_vtime_delta+0x13/0xb0 [ 5662.861975] do_page_fault+0x2a/0x70 [ 5662.862698] ? page_fault+0x65/0x80 [ 5662.863412] page_fault+0x7b/0x80 [ 5662.864151] RIP: 0033:0x7f8bdd404055 [ 5662.864832] RSP: 002b:00007f8bbb9ffbf8 EFLAGS: 00010202 [ 5662.865532] RAX: 00007f8ba3268000 RBX: 0000000000001000 RCX: 00007f89b6680370 [ 5662.866390] RDX: 0000000000000f80 RSI: 00007f89b667f410 RDI: 00007f8ba3268000 [ 5662.867228] RBP: 00007f89b6679000 R08: 0000000000000000 R09: 00007f8ba3268fe0 [ 5662.868086] R10: 0000000000000000 R11: 00007f8ba3268fe0 R12: 00007f8ba3268000 [ 5662.868983] R13: 0000000000000000 R14: 0000000000001000 R15: 00007f8bbb9ffc00 [ 5662.869924] Mem-Info: [ 5662.870670] active_anon:109732569 inactive_anon:1261809 isolated_anon:1 active_file:25038 inactive_file:19183 isolated_file:0 unevictable:18773 dirty:104 writeback:0 unstable:0 slab_reclaimable:41558 slab_unreclaimable:1911380 mapped:49590 shmem:31049 pagetables:314269 bounce:0 free:16200105 free_pcp:19530 free_cma:0 [ 5662.875450] Node 0 active_anon:202941420kB inactive_anon:4491768kB active_file:83824kB inactive_file:42828kB unevictable:74824kB isolated(anon):4kB isolated(file):0kB mapped:183328kB dirty:380kB writeback:0kB shmem:115728kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 71424000kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 5662.877710] Node 1 active_anon:235990312kB inactive_anon:553956kB active_file:16344kB inactive_file:33888kB unevictable:268kB isolated(anon):0kB isolated(file):0kB mapped:15032kB dirty:36kB writeback:0kB shmem:8468kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 101519360kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 5662.880361] Node 0 DMA free:15884kB min:316kB low:392kB high:468kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 5662.883053] lowmem_reserve[]: 0 1854 257824 257824 257824 [ 5662.883989] Node 0 DMA32 free:1063204kB min:37980kB low:47472kB high:56964kB active_anon:791596kB inactive_anon:104kB active_file:8kB inactive_file:16kB unevictable:0kB writepending:0kB present:1964744kB managed:1899176kB mlocked:0kB slab_reclaimable:136kB slab_unreclaimable:42912kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:28kB local_pcp:0kB free_cma:0kB [ 5662.886944] lowmem_reserve[]: 0 0 255969 255969 255969 [ 5662.887910] Node 0 Normal free:41799504kB min:5242356kB low:6552944kB high:7863532kB active_anon:202151200kB inactive_anon:4490656kB active_file:83816kB inactive_file:42812kB unevictable:74824kB writepending:380kB present:266338304kB managed:262118320kB mlocked:74824kB slab_reclaimable:107644kB slab_unreclaimable:5826192kB kernel_stack:21688kB pagetables:597768kB bounce:0kB free_pcp:43860kB local_pcp:248kB free_cma:0kB [ 5662.892100] lowmem_reserve[]: 0 0 0 0 0 [ 5662.893120] Node 1 Normal free:21922356kB min:5284684kB low:6605852kB high:7927020kB active_anon:235990312kB inactive_anon:553956kB active_file:16344kB inactive_file:33888kB unevictable:268kB writepending:36kB present:268435456kB managed:264234864kB mlocked:268kB slab_reclaimable:58452kB slab_unreclaimable:1776416kB kernel_stack:14520kB pagetables:659232kB bounce:0kB free_pcp:34012kB local_pcp:112kB free_cma:0kB [ 5662.897486] lowmem_reserve[]: 0 0 0 0 0 [ 5662.898617] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB [ 5662.901304] Node 0 DMA32: 4891*4kB (UME) 1897*8kB (UME) 1099*16kB (UME) 538*32kB (UME) 294*64kB (UME) 160*128kB (UME) 102*256kB (UME) 63*512kB (UM) 23*1024kB (UM) 2*2048kB (UE) 212*4096kB (M) = 1063204kB [ 5662.903861] Node 0 Normal: 1843614*4kB (UME) 880239*8kB (UME) 363518*16kB (UME) 130399*32kB (UME) 53231*64kB (UME) 27491*128kB (UME) 14049*256kB (ME) 7030*512kB (ME) 3197*1024kB (M) 0*2048kB 0*4096kB = 41800688kB [ 5662.906398] Node 1 Normal: 1256943*4kB (UME) 564510*8kB (UME) 167474*16kB (UME) 47927*32kB (UME) 21662*64kB (UME) 10473*128kB (UME) 6132*256kB (UM) 3562*512kB (UME) 1996*1024kB (M) 0*2048kB 0*4096kB = 21921452kB [ 5662.909068] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 5662.910422] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 5662.911984] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 5662.913335] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 5662.914682] 476262 total pagecache pages [ 5662.916015] 398237 pages in swap cache [ 5662.917343] Swap cache stats: add 959071, delete 560839, find 176681/202436 [ 5662.918745] Free swap = 4536060kB [ 5662.920138] Total swap = 7812092kB [ 5662.921665] 134188618 pages RAM [ 5662.923214] 0 pages HighMem/MovableOnly [ 5662.924729] 2121557 pages reserved [ 5662.926420] 0 pages hwpoisoned Greets, Stefan -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1109908
Stefan Priebe
http://bugzilla.suse.com/show_bug.cgi?id=1109908
Stefan Priebe
http://bugzilla.suse.com/show_bug.cgi?id=1109908
Stefan Priebe
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c4
--- Comment #4 from Vlastimil Babka
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c6
--- Comment #6 from Michal Hocko
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c7
Vlastimil Babka
Vlastimil, could you suggest compaction related tracepoints to enable to see where do we spend the time? There is a lot of free memory to trigger the reclaim here so I suspect it is compaction scanning or something like that.
mm_compaction_begin mm_compaction_end mm_compaction_finished mm_compaction_suitable mm_compaction_try_to_compact_pages at the least, and if it's possible also mm_compaction_isolate_migratepages mm_compaction_isolate_freepages -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1109908
Daniel Aberger
http://bugzilla.suse.com/show_bug.cgi?id=1109908
Nils Fahldieck
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c8
--- Comment #8 from Stefan Priebe
(In reply to Michal Hocko from comment #6)
Vlastimil, could you suggest compaction related tracepoints to enable to see where do we spend the time? There is a lot of free memory to trigger the reclaim here so I suspect it is compaction scanning or something like that.
mm_compaction_begin mm_compaction_end mm_compaction_finished mm_compaction_suitable mm_compaction_try_to_compact_pages
at the least, and if it's possible also
mm_compaction_isolate_migratepages mm_compaction_isolate_freepages
How can i enable those? And what do you need than? dmesg output after stall? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c9
--- Comment #9 from Michal Hocko
(In reply to Vlastimil Babka from comment #7)
(In reply to Michal Hocko from comment #6)
Vlastimil, could you suggest compaction related tracepoints to enable to see where do we spend the time? There is a lot of free memory to trigger the reclaim here so I suspect it is compaction scanning or something like that.
mm_compaction_begin mm_compaction_end mm_compaction_finished mm_compaction_suitable mm_compaction_try_to_compact_pages
at the least, and if it's possible also
mm_compaction_isolate_migratepages mm_compaction_isolate_freepages
How can i enable those? And what do you need than? dmesg output after stall?
One way to do that is to mount tracefs mount -t tracefs none $MOUNT_POINT and enable those tracepoints via for all above tracepoints echo 1 > $MOUNT_POINT/events/compaction/$event/enabled and capture the tracelog cat $MOUNT_POINT/trace_pipe > log there is trace-cmd which can achieve the same thing but I am not familiar with it (and lazy to learn as well ;) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c10
--- Comment #10 from Stefan Priebe
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c11
--- Comment #11 from Michal Hocko
@michal OK thanks. Currently we switched back to your orignal patch and wo don't get any stalls.
Might i ask you again to recheck if your original patch vs. latest patch to SLES15 kernel code really is the "same" (in functionality)?
your report doesn't have __GFP_THISNODE so both patches would have exactly the _same_ effect. There must be yet something more going on at your deployment. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c12
--- Comment #12 from Stefan Priebe
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c13
--- Comment #13 from Stefan Priebe
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c14
--- Comment #14 from Vlastimil Babka
a 2nd note - i'm able to trigger stalls of kvm processes via:
How long stalls in that case?
echo 1 >/proc/sys/vm/compact_memory is this expected?
Some disruption would be expected, as manually invoked compaction is much more heavyweight than the usual one. But depends on how long the echo takes to complete, and how long are the associated stalls. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c15
--- Comment #15 from Stefan Priebe
(In reply to Stefan Priebe from comment #13)
a 2nd note - i'm able to trigger stalls of kvm processes via:
How long stalls in that case?
1-5 minutes
echo 1 >/proc/sys/vm/compact_memory is this expected?
Some disruption would be expected, as manually invoked compaction is much more heavyweight than the usual one. But depends on how long the echo takes to complete, and how long are the associated stalls.
Did not measurre and don't wanna take the vms down again. Regardings the trace points: As it might take some days to happen again can i run cat ..trace_pipe for days? Stefan -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c16
--- Comment #16 from Vlastimil Babka
Regardings the trace points: As it might take some days to happen again can i run cat ..trace_pipe for days?
Compress the output before saving to file and see how much data it produces and whether it can fit. Rotate the logs (or just delete those that don't contain the interesting data). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c17
--- Comment #17 from Vlastimil Babka
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c18
--- Comment #18 from Stefan Priebe
(In reply to Stefan Priebe from comment #15)
Regardings the trace points: As it might take some days to happen again can i run cat ..trace_pipe for days?
Compress the output before saving to file and see how much data it produces and whether it can fit. Rotate the logs (or just delete those that don't contain the interesting data).
OK will try todo so. Do you need the output in general or only after the stall has happened? (In reply to Vlastimil Babka from comment #17)
BTW, are you using xfs? yes - why?
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c19
--- Comment #19 from Vlastimil Babka
OK will try todo so. Do you need the output in general or only after the stall has happened?
Only after stall has happened, should be enough.
(In reply to Vlastimil Babka from comment #17)
BTW, are you using xfs? yes - why?
There are some reports related to long runs of xfs shrinkers, which manifest differently, but I can imagine they could cause these stalls as well. So maybe enable also events/vmscan/mm_shrink_slab_start and events/vmscan/mm_shrink_slab_end -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1109908
http://bugzilla.suse.com/show_bug.cgi?id=1109908#c20
Borislav Petkov
participants (1)
-
bugzilla_noreply@novell.com