[Bug 416251] New: kernel oops during PV guest install
https://bugzilla.novell.com/show_bug.cgi?id=416251 User plc@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=416251#c1 Summary: kernel oops during PV guest install Product: openSUSE 11.1 Version: Alpha 1 Platform: x86-64 OS/Version: openSUSE 11.0 Status: NEW Severity: Critical Priority: P5 - None Component: Xen AssignedTo: jbeulich@novell.com ReportedBy: plc@novell.com QAContact: qa@suse.de CC: dcollingridge@novell.com, jdouglas@novell.com, jfehlig@novell.com, lbendixs@novell.com Found By: Development Attempted to install openSuSE 11.1a using latest kernel build from Xen team, kernel-xen-2.6.26-10.x86_64.rpm, received following kernel oops. BIOS EDD facility v0.16 2004-Jun-25, 0 devices found EDD information not available. squashfs: version 3.3-CVS (2008/04/04) Phillip Lougher BIOS EDD facility v0.16 2004-Jun-25, 0 devices found EDD information not available. BUG: unable to handle kernel paging request at ffff88001691e000 IP: [<ffffffff8027d5a5>] handle_mm_fault+0x54c/0xc28 PGD 3cd5067 PUD 3cd6067 PMD 3d8b067 PTE 801000001691e065 Oops: 0003 [1] SMP last sysfs file: /sys/kernel/uevent_seqnum CPU 0 Modules linked in: dm_snapshot dm_mod multipath raid456 async_xor async_memcpy async_tx xor raid1 raid0 parport_pc parport squashfs nls_utf8 arc4 ecb crypto_blkcipher loop nfs nfs_acl lockd sunrpc nls_iso8859_1 nls_cp437 ipv6 af_packet sg st sd_mod sr_mod scsi_mod ide_disk ide_cd_mod cdrom ide_core xennet xenblk rtc_core rtc_lib [last unloaded: parport] Pid: 3370, comm: parted Tainted: G 2.6.26-10-xen #1 RIP: e030:[<ffffffff8027d5a5>] [<ffffffff8027d5a5>] handle_mm_fault+0x54c/0xc28 RSP: e02b:ffff880017377d68 EFLAGS: 00010282 RAX: 00000000ffffffea RBX: ffff880015d90780 RCX: ffff880003bf42a0 RDX: 0000000000000000 RSI: 800000001c5bc0e7 RDI: 00007f01dc000010 RBP: ffff880017377dd8 R08: 0000000080615503 R09: ffffffff80615583 R10: 00000000000015d9 R11: 0000000000000001 R12: ffff88001691e000 R13: 800000001c5bc0e7 R14: ffff880004340780 R15: ffff880017377f58 FS: 00007f01e54ee740(0000) GS:ffffffff80641000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process parted (pid: 3370, threadinfo ffff880017376000, task ffff8800169ae140) Stack: ffff880017c16180 0000000116b023d8 00007f01dc000010 ffff880016b02c60 ffff880015d90780 ffff880000000000 ffff880016857700 0000000000000001 ffff880017377dc8 ffff880016b02c60 00007f01dc000010 0000000000000006 Call Trace: [<ffffffff80459b54>] do_page_fault+0xc0f/0x1075 [<ffffffff80457487>] error_exit+0x0/0x69 [<00007f01e490fb51>] Code: 00 00 00 48 8b 5d b0 48 3b 98 28 02 00 00 74 09 48 81 fb 40 91 60 80 75 12 48 8b 7d a0 4c 89 ee 31 d2 e8 1f 9c f8 ff 85 c0 74 04 <4d> 89 2c 24 49 8d 7e 10 e9 9a 06 00 00 48 89 df e8 66 5b 01 00 RIP [<ffffffff8027d5a5>] handle_mm_fault+0x54c/0xc28 RSP <ffff880017377d68> CR2: ffff88001691e000 ---[ end trace 955c0329ce3d9490 ]--- -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=416251 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=416251#c1 Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |plc@novell.com --- Comment #1 from Jan Beulich <jbeulich@novell.com> 2008-08-11 09:07:39 MDT --- There almost certainly are hypervisor messages associated with this crash, which I'll need to see along with the kernel ones you already provided. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=416251 User plc@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=416251#c2 --- Comment #2 from Pat Campbell <plc@novell.com> 2008-08-11 10:14:40 MDT --- Created an attachment (id=232758) --> (https://bugzilla.novell.com/attachment.cgi?id=232758) Xen hypervisor output Attached is the entire hypervisor output Last few lines: (XEN) mm.c:645:d1 Non-privileged (1) attempt to map I/O space 0000009f (XEN) mm.c:645:d1 Non-privileged (1) attempt to map I/O space 000000e0 (XEN) mm.c:645:d1 Non-privileged (1) attempt to map I/O space 00000000 (XEN) mm.c:645:d1 Non-privileged (1) attempt to map I/O space 000000c0 (XEN) mm.c:645:d1 Non-privileged (1) attempt to map I/O space 0000009f (XEN) mm.c:645:d1 Non-privileged (1) attempt to map I/O space 000000e0 (XEN) mm.c:1464:d1 Bad L1 flags 80 (XEN) mm.c:631:d1 Bad L1 flags 80 (XEN) mm.c:3618:d1 ptwr_emulate: could not get_page_from_l1e() -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
From what I can see this is due to a conflict between _PAGE_PROTNONE (=_PAGE_PSE) and _PAGE_CHG_MASK (including _PAGE_PAT), since _PAGE_PSE and _PAGE_PAT are really the same bit. This isn't an immediate problem for Dom0 (as
https://bugzilla.novell.com/show_bug.cgi?id=416251 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=416251#c3 Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|plc@novell.com | Priority|P5 - None |P2 - High QAContact|qa@suse.de |jdouglas@novell.com --- Comment #3 from Jan Beulich <jbeulich@novell.com> 2008-08-15 04:36:56 MDT --- there _PAGE_PAT is permitted to be set in a page table entry), but it is for DomU-s without physical devices assigned (none of _PAGE_PCD, _PAGE_PWT, and _PAGE_PAT may be set there). There are three possible ways to address this: - use one of the _PAGE_UNUSED{1,2,3} bits for _PAGE_PROTNONE - use _PAGE_ACCESSED alone for _PAGE_PROTNONE (and hence PAGE_NONE) - special case _PAGE_PROTNONE in pgprot_modify() and pte_modify(). Since mainline will need to deal with that anyway I made an attempt at finding out what route they'd go so we won't have to re-adjust this code later. I'd personally favor the second option, but am uncertain of possible consequences of using _PAGE_ACCESSED here. The safest route would obviously be using _PAGE_UNUSED*, but that may bare problems going forward since Xen already uses one of the three bits, and 2.6.27 introduces use of one of them, too. So there's going to be no bit left here. An alternative may be using one of the high unused bits (52-62) for _PAGE_IO as well as _PAGE_PROTNONE, but with Xen using bit 52 (on x86-64) and eventual future hardware enhancements likely going to make use of some of them this doesn't look too compelling either. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=416251 Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=416251 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=416251#c4 Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |npiggin@novell.com --- Comment #4 from Jan Beulich <jbeulich@novell.com> 2008-08-18 01:06:56 MDT --- Nick, as I didn't get any response from the x86 maintainers or the Intel guy who controbuited the PAT patches - would you happen to have an opinion here or any additional insight on possible pitfalls with any of the considered approaches? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=416251 User npiggin@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=416251#c7 --- Comment #7 from Nick Piggin <npiggin@novell.com> 2008-08-18 10:12:59 MDT --- Sorry I just can't exactly see how it fits together -- I'd not followed the PAT stuff too closely. AFAIKS _PAGE_PAT/_PAGE_PSE (bit 7) is not set in _PAGE_CHG_MASK. But anyway hmm... I don't know exactly about using _PAGE_ACCESSED for the _PAGE_PROTNONE bit... it might possibly be able to blow up in page reclaim when we age the ptes and strip _PAGE_ACCESSED? I think those definitions just contain _PAGE_ACCESSED so that when the fault handler sets up the new pte then the CPU will not trap again to set the bit itself. _PAGE_PROTNONE is only valid when _PAGE_PRESENT is clear, so really we can use any bit for it, it's no problem to change it somewhere else... but I'd just like to understand what I'm missing here. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=416251 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=416251#c8 --- Comment #8 from Jan Beulich <jbeulich@novell.com> 2008-08-19 06:06:39 MDT ---
Sorry I just can't exactly see how it fits together -- I'd not followed the PAT stuff too closely. AFAIKS _PAGE_PAT/_PAGE_PSE (bit 7) is not set in _PAGE_CHG_MASK.
It's not for native, but it must be for Xen - native only uses the first 4 PAT entries, and hence doesn't really need the PAT bit. Xen (the hypervisor) doesn't permit updating of the PAT MSR, and hence we have to live with the fact that the WC type is accessible only with index 0b100 (i.e. 4), and thus in order to permit WC we have to make use of _PAGE_PAT (and consequently include it in _PAGE_CHG_MASK).
_PAGE_PROTNONE is only valid when _PAGE_PRESENT is clear, so really we can use any bit for it, it's no problem to change it somewhere else... but I'd just like to understand what I'm missing here.
The only valid bits _PAGE_PROTNONE could use are those not included in _PAGE_CHG_MASK (unless we change pte_modify() and pgprot_modify() to special-case _PAGE_NONE), and it also cannot be _PAGE_FILE (i.e. _PAGE_DIRTY). But indeed, since e.g. pte_young() only takes pte_present() (but not _PAGE_PRESENT) as prerequisite, using _PAGE_ACCESSED here indeed wouldn't work. On a second look it would seem, however, that _PAGE_USER could be used here (there's no accessor testing/modifying this bit, and it's not part of _PAGE_CHG_MASK). However, _PAGE_USER has special semantics on older 64-bit hypervisors, so it's not a good candidate if backward compatibility matters. Likewise, one could consider using _PAGE_GLOBAL (since the respective accessors are stubs on Xen, as the 32-bit hypervisor doesn't permit its use, and the 64-bit hypervisor only conditionally permits it, so it doesn't have the meaning the kernel would normally give it). All of the other bits (apart from _PAGE_UNUSED2) aren't candidates: - _PAGE_PRESENT for obvious reasons - _PAGE_RW because of pte_write() etc - _PAGE_PWT, _PAGE_PCD, _PAGE_PAT, _PAGE_UNUSED1 (_PAGE_SPECIAL) and _PAGE_UNUSED3 (_PAGE_IO) because they're part of _PAGE_CHG_MASK - _PAGE_DIRTY because it equals _PAGE_FILE and is also part of _PAGE_CHG_MASK. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=416251 User npiggin@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=416251#c9 --- Comment #9 from Nick Piggin <npiggin@novell.com> 2008-08-20 04:41:11 MDT --- If nothing else, then _PAGE_PROTNONE could use _PAGE_UNUSED2. You wouldn't be using up the last pte bit for present ptes. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=416251 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=416251#c10 Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|npiggin@novell.com | Resolution| |FIXED --- Comment #10 from Jan Beulich <jbeulich@novell.com> 2008-08-25 07:45:32 MDT --- Should be fixed in all 2.6.27-based kernels to come. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com