[Bug 922235] New: xen guest - kernel BUG at ../arch/x86/include/mach-xen/asm/maddr.h:37!
http://bugzilla.opensuse.org/show_bug.cgi?id=922235 Bug ID: 922235 Summary: xen guest - kernel BUG at ../arch/x86/include/mach-xen/asm/maddr.h:37! Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Xen Assignee: jdouglas@suse.com Reporter: per@computer.org QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 626763 --> http://bugzilla.opensuse.org/attachment.cgi?id=626763&action=edit console text from dupont5 I shut down a xen host with four guests in order to move it into a rack. On startup, the xen bridge interface was not connected. The four guests were auto-started normally. After connecting the cable, one guest died, see attachment from dupont5. The other three guests were initially responding, ssh sessions resumed. Then I started "yast lan" on dupont6, and it oops'ed too, see attachment. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #1 from Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #2 from Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #4 from Per Jessen
Could we see the full kernel logs please of at least one of the two cases?
I'll be happy to try to reproduce - do you just need the dmesg output? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #13 from Per Jessen
(In reply to Per Jessen from comment #11)
The DomU is back up, where do I look for the data you need?
The tail of /var/log/messages (ideally with the part relating to the current session [i.e. after the crash] zapped along with everything prior to the session leading to the crash).
Well, at least reproducing seems quite easy - I just rebooted the host, all four guests crashed. Capturing the log seems more difficult - nothing is written to /var/log/messages nor /var/log/boot.msg. I'm enabling service klog to see if that makes a difference. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #16 from Per Jessen
Created attachment 632054 [details] debugging patch
Here's a debugging patch the log output of which will hopefully shed some light on what's going on.
This is for the DomU right?
For non-debugging purposes you should, btw, be able to avoid the problem by passing "max_indirect_segments=0" to xenblk.ko (or, globally for all guests, passing the same to blkbk.ko).
Ah okay, thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #18 from Per Jessen
(In reply to Per Jessen from comment #16)
This is for the DomU right?
Being a frontend patch - yes, of course.
I've applied that patch to kernel 3.16.7-21-xen on all four guests - sofar I cannot reproduce the problem. I'll leave it to run for a while and see if it changes. I did see one weird error, I'll attach some output in a minute. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #19 from Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #21 from Per Jessen
(In reply to Per Jessen from comment #18)
I've applied that patch to kernel 3.16.7-21-xen on all four guests - sofar I cannot reproduce the problem. I'll leave it to run for a while and see if it changes.
Just to be certain - this was with the patch, but without the command line option based workaround in place? (The patch indeed contains one change that might improve behavior.)
Yes, it was without the command line option.
(In reply to Per Jessen from comment #19)
Created attachment 633352 [details] BUG: soft lockup - CPU#0 stuck for 23s
On one of my reboots while attempting to reproduce, one of the guests kept getting this error. I have seen this on physical machines before, but never on a xen guest.
Sadly you cut off too much from the log - right before the softlockup there was some kind of crash in xenblk.ko, and without seeing the call trace and register state (and perhaps also holding the module binary in hands to disassemble it) I can't tell whether that's related to the issue here (including the debugging patch) or whether it's something unrelated.
Okay, I'll see if I can reproduce it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #22 from Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #23 from Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #24 from Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #25 from Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #26 from Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
Per Jessen
(In reply to Per Jessen from comment #26)
Created attachment 633688 [details] kernel BUG at drivers/xen/blkfront/blkfront.c:1214!
kernel BUG at drivers/xen/blkfront/blkfront.c:1214!
followed by repeating:
BUG: soft lockup - CPU#0 stuck for 22s
Please can you avoid adding seemingly redundant logs? Or if you think they're showing something different than earlier ones, please point out that difference.
Let me know if you have the info you need. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #32 from Per Jessen
Ping?
Sorry, I've been a bit busy lately. I'll apply the patch and get bck to you. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
--- Comment #33 from Per Jessen
http://bugzilla.opensuse.org/show_bug.cgi?id=922235
Per Jessen
I have rebooted this xenhost (with four guests) quite a few times since yesterday, sofar no problems. When a guest starts up, I see stuff like this in the kernel buffer:
[48114.377064] blkfront: device/vbd/768: ring-pages=1 nr-ents=32 segs-per-req=32 [48114.866146] blkfront: device/vbd/832: ring-pages=1 nr-ents=32 segs-per-req=32
[48107.574760] blkfront: device/vbd/768: ring-pages=1 nr-ents=32 segs-per-req=32
[47856.710182] blkfront: device/vbd/768: ring-pages=1 nr-ents=32 segs-per-req=32
[47889.752094] blkfront: device/vbd/768: ring-pages=1 nr-ents=32 segs-per-req=32
I'll try a few more reboots this morning.
I've done plenty of reboots during today, I am unable to reproduce the problem. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com