https://bugzilla.suse.com/show_bug.cgi?id=1180917 https://bugzilla.suse.com/show_bug.cgi?id=1180917#c9 --- Comment #9 from LTC BugProxy <bugproxy@us.ibm.com> --- ------- Comment From geraldsc@de.ibm.com 2021-01-19 09:45 EDT------- (In reply to comment #13) [...]
Is there any other means of kernel source access for openSUSE Tumbleweed, ideally a git repo like for SLES? Seems hard to believe that "open"SUSE kernel source is harder to find / access than SLES code...
It is not hard to find at all. All you need to know is, that the different flavors of kernels all depend on a central package called kernel-source, which has an own mechanics to integrate patches depending on a variety of conditions. The source can be found in the package http://download.opensuse.org/ports/zsystems/tumbleweed/repo/oss/noarch/ kernel-source-5.10.5-1.1.noarch.rpm
Hmm, almost right, you also need to know that the kernel-source.noarch.rpm does not contain the full source, and that you need the kernel-devel.noarch.rpm on top (e.g. for arch code). This is the same for SLES, so fortunately I knew it, but of course for SLES it is much less annoying because you also have a proper public git for both kernel source tree and also kernel-source src.rpm content, so you don't really need to bother about knowing which rpms contain what... Anyway, back to the bug, from the kernel source for 5.10.5-1-default I see that it happens on the BUG_ON(!pte_none(*pte)) in __split_huge_pmd_locked(). This is very strange / interesting, because those are the ptes from the pre-allocated and deposited pagetable, which was withdrawn just shortly before that BUG_ON, with pgtable_trans_huge_withdraw(). The pre-allocated pagetables are initialized with empty (invalid) ptes before deposit, so they should of course all (still) be pte_none() after withdrawal. If a pte is !pte_none, then this means that either the pre-allocated pagetable got corrupted while it was deposited, or maybe that pgtable_trans_huge_withdraw() returns something that is not really a pagetable at all. E.g. in theory it could return NULL, if there were more withdrawals than deposits, IIUC the list handling code there correctly. Of course, such a thing should never happen (i.e. it would be a bug), but I am a bit confused why the common code does not also check this with a BUG_ON check. Having a system dump could help to see more of what was going on. Any chance that kdump generated a dump after the BUG_ON? From the backtrace and register output, and a kernel disassembly, one can at least see that in %r1 we have the pte value that did not pass the !pte_none check: 00000000b2b40215. This actually looks like a valid pte, with present / young / read / write-protect set, so one could assume that this is not the "NULL returnend" case, but rather really a pre-allocated pagetable, which somehow got corrupted by someone having it in active access and filling it with valid ptes. Of course, such a thing should also never happen, the pre-allocated and deposited pagetables can not be used until they are withdrawn, very strange. We do actually have an own implementation of the deposit/withdraw functions, because we cannot use the generic versions. On s390, we have 2K pagetables, and two of them within one 4K page, so we cannot use the generic logic that operates on struct pages for (4K) pagetable pages. The pgtable_t is therefore also not a struct page on s390, but rather a direct pointer to the pagetable. For maintainig the list of pre-allocatced pagetables, we put a list_head directly into the pageteables, at the beginning, instead of using page->lru of the struct page associated with the pagetable like it is donr in the generic case. Then, on withdraw, and after list_del, the first two ptes will be cleared so that the list_head gets overwritten and the whole pagetable should be empty again. That is at least suspicious, and it could could explain why you only see this on s390 (do you really?). However, I do not really see how our implementation would change anything that allows the deposited pagetables to change before withdrawing them. It is really the same logic as in generic code, only that we put the list_head somewhere else. I still suspect some race in common code, e.g. some concurrent withdrawals w/o proper locking, but I could not yet find anything suspicious in the code... -- You are receiving this mail because: You are the assignee for the bug.