[Bug 1039737] Kernel BUG at ../mm/huge_memory.c / split_huge_page
http://bugzilla.suse.com/show_bug.cgi?id=1039737 http://bugzilla.suse.com/show_bug.cgi?id=1039737#c10 --- Comment #10 from Jochen Hansper <hansper@t-online.de> --- (In reply to Vlastimil Babka from comment #9)
I've looked at more detail at the report and it's triggering here in __split_huge_page_map():
BUG_ON(!pte_none(*pte));
where pte points to a deposited page table that the huge page keeps for when it needs to be split. Nobody should be accessing it while deposited, but here it was clearly written to. This definitely doesn't look like a THP vs something race that's being fixed upstream semi-regularly.
Unfortunately we can't see from the oops what was the unexpected value in the page table, RDX points there but we don't see the contents. One possibility is to setup kdump and produce a crash dump to inspect. Or we add some debug printing. We could also make the deposited page read-only which would trigger on any writes, unless it's a HW problem.
I've rebuilt the debug kernel with the following .config settings: CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_VMACACHE=y CONFIG_DEBUG_VM_RB=y CONFIG_DEBUG_VIRTUAL=y Will this help debugging? I've not managed to get X running, yet (debug kernel + nvidia or nouveau). I'll take a look at that, if this approach can be useful. Unfortunately, I really need the machine where the bug happened on a day-to-day basis, so I can't go without X... Before opening this bug report, I ran 24h+ memtest and 24h+ prime95 (on Ubuntu 16.10, kernel 4.8) without issues. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com