Comment # 10 on bug 1039737 from Jochen Hansper

(In reply to Vlastimil Babka from comment #9)
> I've looked at more detail at the report and it's triggering here in
> __split_huge_page_map():
> 
> BUG_ON(!pte_none(*pte));
> 
> where pte points to a deposited page table that the huge page keeps for when
> it needs to be split. Nobody should be accessing it while deposited, but
> here it was clearly written to. This definitely doesn't look like a THP vs
> something race that's being fixed upstream semi-regularly.
> 
> Unfortunately we can't see from the oops what was the unexpected value in
> the page table, RDX points there but we don't see the contents. One
> possibility is to setup kdump and produce a crash dump to inspect. Or we add
> some debug printing. We could also make the deposited page read-only which
> would trigger on any writes, unless it's a HW problem.

I've rebuilt the debug kernel with the following .config settings:

CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_VMACACHE=y 
CONFIG_DEBUG_VM_RB=y
CONFIG_DEBUG_VIRTUAL=y

Will this help debugging?

I've not managed to get X running, yet (debug kernel + nvidia or nouveau). I'll
take a look at that, if this approach can be useful. Unfortunately, I really
need the machine where the bug happened on a day-to-day basis, so I can't go
without X...

Before opening this bug report, I ran 24h+ memtest and 24h+ prime95 (on Ubuntu
16.10, kernel 4.8) without issues.