[Bug 926594] New: Xen4.5+kernel-xen- 3.19.3-3.1 PANIC on every reboot, "Panic on CPU 0: FATAL PAGE FAULT"
http://bugzilla.suse.com/show_bug.cgi?id=926594 Bug ID: 926594 Summary: Xen4.5+kernel-xen- 3.19.3-3.1 PANIC on every reboot, "Panic on CPU 0: FATAL PAGE FAULT" Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: x86-64 OS: openSUSE 13.2 Status: NEW Severity: Critical Priority: P5 - None Component: Xen Assignee: jdouglas@suse.com Reporter: lyndat3@your-mail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- I'm running kernel-xen-3.19.3-3.1.g0f5134a.x86_64 xen-4.5.0_03-364.3.x86_64 xen-libs-4.5.0_03-364.3.x86_64 xen-tools-4.5.0_03-364.3.x86_64 The system boots completely and functions. On reboot/shutdown -r, it crashes every time. It's 100% reproducible here. Here's the console log. shutdown -r now ... [59407.439868] reboot: Restarting system (XEN) [2015-04-09 13:46:45] Domain 0 shutdown: rebooting machine. (XEN) [2015-04-09 13:46:45] ----[ Xen-4.5.0_03-363 x86_64 debug=n Tainted: C ]---- (XEN) [2015-04-09 13:46:45] CPU: 0 (XEN) [2015-04-09 13:46:45] RIP: e008:[<000000009e6c4000>] 000000009e6c4000 (XEN) [2015-04-09 13:46:45] RFLAGS: 0000000000010247 CONTEXT: hypervisor (XEN) [2015-04-09 13:46:45] rax: 000000009e670340 rbx: 0000000000000000 rcx: 0000000000000000 (XEN) [2015-04-09 13:46:45] rdx: 0000000000000000 rsi: 0000000000000000 rdi: 0000000000000000 (XEN) [2015-04-09 13:46:45] rbp: 0000000000000000 rsp: ffff82d080457dd0 r8: 0000000000000000 (XEN) [2015-04-09 13:46:45] r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000008 (XEN) [2015-04-09 13:46:45] r12: 0000000000000000 r13: 0000000000000061 r14: 00000000fee1dead (XEN) [2015-04-09 13:46:45] r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000001526f0 (XEN) [2015-04-09 13:46:45] cr3: 00000008459a3000 cr2: 000000009e6c4000 (XEN) [2015-04-09 13:46:45] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) [2015-04-09 13:46:45] Xen stack trace from rsp=ffff82d080457dd0: (XEN) [2015-04-09 13:46:45] 000000009efe42f6 00000000fee1dead ffff82d0802278a4 efff00000000000a (XEN) [2015-04-09 13:46:45] ffff82d080262000 00000008459a3000 ffff82d080227a7a 000000069462e000 (XEN) [2015-04-09 13:46:45] 0000000000000000 0000000000152670 0000000000000700 0000000000000061 (XEN) [2015-04-09 13:46:45] 0000000000000000 00000000fffffffe ffff82d08018477c ffff82d080457e98 (XEN) [2015-04-09 13:46:45] 0000000080457e58 0000361aa81209b7 0000000000000000 0000000000000001 (XEN) [2015-04-09 13:46:45] 0000000000000001 ffff83084595b000 ffff83084595b138 00000000fee1dead (XEN) [2015-04-09 13:46:45] ffff82d080129c99 0000000000000003 ffff82d080105721 0000000000000000 (XEN) [2015-04-09 13:46:45] 0000000000000000 0000000028121969 ffffffff80a1f100 0000000000000002 (XEN) [2015-04-09 13:46:45] ffff82d080128d3f 00000001ffffffff 0000000000000005 ffff83009e786000 (XEN) [2015-04-09 13:46:45] 00000000000bcbd0 0000000000000000 ffff83009e786000 0000000028121969 (XEN) [2015-04-09 13:46:46] ffff82d080224119 ffffc9000819e080 ffffc9000819e16c ffff8800b610e4a7 (XEN) [2015-04-09 13:46:46] ffffc9000819e168 0000000028121969 0000000000000000 0000000000000282 (XEN) [2015-04-09 13:46:46] 0000000000000064 0000000000000510 0000000000000000 000000000000001d (XEN) [2015-04-09 13:46:46] ffffffff800113aa 0000000000000001 ffff8800306ebe1c 0000000000000002 (XEN) [2015-04-09 13:46:46] 0001010000000000 ffffffff800113aa 000000000000e033 0000000000000282 (XEN) [2015-04-09 13:46:46] ffff8800306ebdf8 000000000000e02b 0000000000000000 0000000000000000 (XEN) [2015-04-09 13:46:46] 0000000000000000 0000000000000000 0000000000000000 ffff83009e786000 (XEN) [2015-04-09 13:46:46] 0000000000000000 0000000000000000 (XEN) [2015-04-09 13:46:46] Xen call trace: (XEN) [2015-04-09 13:46:46] [<000000009e6c4000>] 000000009e6c4000 (XEN) [2015-04-09 13:46:46] [<ffff82d0802278a4>] efi_rs_enter+0xf4/0x110 (XEN) [2015-04-09 13:46:46] [<ffff82d080227a7a>] efi_reset_system+0x3a/0x60 (XEN) [2015-04-09 13:46:46] [<ffff82d08018477c>] machine_restart+0xcc/0x220 (XEN) [2015-04-09 13:46:46] [<ffff82d080129c99>] hwdom_shutdown+0x89/0x90 (XEN) [2015-04-09 13:46:46] [<ffff82d080105721>] domain_shutdown+0xf1/0x100 (XEN) [2015-04-09 13:46:46] [<ffff82d080128d3f>] do_sched_op+0x1af/0x440 (XEN) [2015-04-09 13:46:46] [<ffff82d080224119>] syscall_enter+0xa9/0xae (XEN) [2015-04-09 13:46:46] (XEN) [2015-04-09 13:46:46] Pagetable walk from 000000009e6c4000: (XEN) [2015-04-09 13:46:46] L4[0x000] = 00000008459a2063 ffffffffffffffff (XEN) [2015-04-09 13:46:46] L3[0x002] = 000000008c674063 ffffffffffffffff (XEN) [2015-04-09 13:46:46] L2[0x0f3] = 000000009e5ff063 ffffffffffffffff (XEN) [2015-04-09 13:46:46] L1[0x0c4] = 0000000000000000 ffffffffffffffff (XEN) [2015-04-09 13:46:46] (XEN) [2015-04-09 13:46:46] **************************************** (XEN) [2015-04-09 13:46:46] Panic on CPU 0: (XEN) [2015-04-09 13:46:46] FATAL PAGE FAULT (XEN) [2015-04-09 13:46:46] [error_code=0010] (XEN) [2015-04-09 13:46:46] Faulting linear address: 000000009e6c4000 (XEN) [2015-04-09 13:46:46] **************************************** (XEN) [2015-04-09 13:46:46] (XEN) [2015-04-09 13:46:46] Manual reset required ('noreboot' specified) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=926594
Charles Arnold
http://bugzilla.suse.com/show_bug.cgi?id=926594
Jan Beulich
http://bugzilla.suse.com/show_bug.cgi?id=926594
lynda t
http://bugzilla.suse.com/show_bug.cgi?id=926594
Jan Beulich
http://bugzilla.suse.com/show_bug.cgi?id=926594
--- Comment #4 from lynda t
Please note that I said "complete", not "from the moment 'shutdown -r now' is exec'd". And please attach logs instead of inlining them.
Can you please just say what you do want, not what you don't? Are you looking for a serial log FROM bootup TO failed shutdown? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=926594
--- Comment #5 from Jan Beulich
Are you looking for a serial log FROM bootup TO failed shutdown?
Exactly. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=926594
lynda t
http://bugzilla.suse.com/show_bug.cgi?id=926594
Jan Beulich
http://bugzilla.suse.com/show_bug.cgi?id=926594
Jan Beulich
http://bugzilla.suse.com/show_bug.cgi?id=926594
--- Comment #9 from lynda t
The workaround mentioned is already in the upstream trees for 4.5.1 and 4.4.3, and hence will become available eventually via maintenance update.
This sound then like a definite Xen bug, and that a resolution requires integrating a known fix from upstream. Looking at http://wiki.xenproject.org/wiki/Xen_Project_Hypervisor_Roadmap/4.5, 4.5.1, not even mentioned there, seems like it's many months out still. But this ^^ has been marked 'wontfix'. Does that mean the known fix will not be integrated into existing tree? Just confused a bit. What's meant by a 'maintenance update' here? Does that mean a patch will be backported to the existing 4.5.0 release in openSUSE's Xen sometime in the nearer future? If it does, is there any roadmap / timeline? Can't begin to use Xen in any sort of production until at least this gets resolved. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=926594
--- Comment #10 from Jan Beulich
This sound then like a definite Xen bug, and that a resolution requires integrating a known fix from upstream.
Where do you see the Xen bug here? It is a bug of the firmware not to mark for runtime use any code/data that can be accessed by EFI Runtime Services calls.
Looking at http://wiki.xenproject.org/wiki/Xen_Project_Hypervisor_Roadmap/4.5, 4.5.1, not even mentioned there, seems like it's many months out still.
The roadmap only talks about major releases, not stable ones afaik. We're going to cut 4.5.1-rc1 within the next couple of weeks (preparations are already in progress).
But this ^^ has been marked 'wontfix'. Does that mean the known fix will not be integrated into existing tree? Just confused a bit.
For one there is no "known fix", only a workaround (hence the "wontfix"). And then ...
What's meant by a 'maintenance update' here? Does that mean a patch will be backported to the existing 4.5.0 release in openSUSE's Xen sometime in the nearer future? If it does, is there any roadmap / timeline? Can't begin to use Xen in any sort of production until at least this gets resolved.
... 13.2 will pick up 4.4.3 eventually (that'll take some time, as 4.4.2 got released pretty recently), and Factory will presumably pick up 4.5.1 once available, i.e. the workaround will become available. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=926594
--- Comment #11 from lynda t
(In reply to lynda t from comment #9)
This sound then like a definite Xen bug, and that a resolution requires integrating a known fix from upstream.
Where do you see the Xen bug here? It is a bug of the firmware not to mark for runtime use any code/data that can be accessed by EFI Runtime Services calls.
That's part of the problem -- I don't "see" any bug. There's no reference to a specific commit. I posted this ^^ as a Xen bug, correctly or not. That hasn't been changed by anyone. You've asked for Xen logs, and referenced a line in the Xen output. Then you've said the solution "is already in the upstream trees for 4.5.1:. 4.5.1 sounds like Xen.
The roadmap only talks about major releases, not stable ones afaik. We're going to cut 4.5.1-rc1 within the next couple of weeks (preparations are already in progress).
If that rc will have this ^^ solution in it, that's good to know. Leaves me in the lurch for awhile, but that's how the dust settles. Bunch of other stuff that needs fixing will eat up time anyway ...
For one there is no "known fix", only a workaround (hence the "wontfix"). And then ...
Ok, I guess that means something to dev in detail. I read that differently -- As an end-user filing the bug, my goal is to know if/when a "fixed" version will be packaged & available somewhere we can dl and use it. ok, nm then.
... 13.2 will pick up 4.4.3 eventually (that'll take some time, as 4.4.2 got released pretty recently), and Factory will presumably pick up 4.5.1 once available, i.e. the workaround will become available.
I'm running a 13.2 core, with Xen 4.5.x and kernel-xen from Stable. "Not supported" I know, but it's the only config I find that's close to production ready for my hardware and needs. I'll keep an eye out for the 4.5.1 workaround -- that sounds like it'll solve this problem. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=926594
--- Comment #12 from Jan Beulich
That's part of the problem -- I don't "see" any bug. There's no reference to a specific commit.
Not here, but I had mailed it to you in the list conversation: http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=c643fb110a51693e82a3...
I posted this ^^ as a Xen bug, correctly or not. That hasn't been changed by anyone. You've asked for Xen logs, and referenced a line in the Xen output. Then you've said the solution "is already in the upstream trees for 4.5.1:.
I have no idea what you're trying to get at. This is a firmware bug. Period. And there's no solution in Xen, just a workaround (to use another reboot method). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=926594
--- Comment #13 from lynda t
I have no idea what you're trying to get at. This is a firmware bug. Period. And there's no solution in Xen, just a workaround (to use another reboot method).
I'm not trying to 'get at' anything. You asked a question, I answered. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=926594
lynda t
http://bugzilla.suse.com/show_bug.cgi?id=926594
lynda t
As suspected the area is not marked for runtime mapping:
(XEN) 000009e6c4000-000009e6ddfff type=0 attr=000000000000000f
Hence this is a firmware bug.
The workaround mentioned is already in the upstream trees for 4.5.1 and 4.4.3, and hence will become available eventually via maintenance update.
Not here, but I had mailed it to you in the list conversation: http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=c643fb110a51693e82a3...
Backport of referenced workaround applied https://bugzilla.suse.com/show_bug.cgi?id=928783 No change in behavior "still fails, requiring manual reset, per original report" Redirected back to here "You are correct that this bug is complete. Sorry, but you can't take this bug and evolve it into what is essentially the other bug. You may continue that discussion there and see what Jan has to say." -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=926594
Jan Beulich
http://bugzilla.suse.com/show_bug.cgi?id=926594
--- Comment #16 from lynda t
http://bugzilla.suse.com/show_bug.cgi?id=926594
--- Comment #17 from Jan Beulich
Could you please consider not constantly inflaming discussions with your "I have no idea why you ...".
I'm sorry if this came over offending. To me, opening a 2nd bug for the same issue looks like spamming bugzilla. (In fact, even if it wasn't you who opened the first one we'd expect you to look for bugs matching your problem before opening a new one.) Plus you report the problem against a Xen version not in any openSUSE release, i.e. apart from this being a firmware bug (and hence you rather needing to push your hardware vendor for a firmware update) I don't think it is a valid expectation for us to have a workaround for you available immediately. For such expectations I think you'd need to pay for a SLE support contract.
I opened it to make the backport request separately from this issue.
First, it seemed more efficient. It proved to be, as Charles quickly and politely addressed it. Case closed.
And it was eminently clear that YOU weren't going to do a thing about it here.
This is clearly wrong: I _am_ doing things to help this, just perhaps in a way not visible to you. As the upstream stable tree maintainer, I am making sure that 4.5.1 will have the change (unless a regression would be found with it), and that it will get released in as timely a manner as possible. Considering all of the above, this necessarily means that you either have to be a little patient, or apply the workaround on your own until an official update is available. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=926594
--- Comment #18 from lynda t
http://bugzilla.suse.com/show_bug.cgi?id=926594
Mike Latimer
http://bugzilla.suse.com/show_bug.cgi?id=926594
Jan Beulich
http://bugzilla.suse.com/show_bug.cgi?id=926594
Mike Latimer
http://bugzilla.suse.com/show_bug.cgi?id=926594
Mike Latimer
http://bugzilla.suse.com/show_bug.cgi?id=926594
Martin Pluskal
participants (1)
-
bugzilla_noreply@novell.com