[Bug 487932] New: domU boot stalls and reboots
https://bugzilla.novell.com/show_bug.cgi?id=487932 User snorp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c1 Summary: domU boot stalls and reboots Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Xen AssignedTo: cgriffin@novell.com ReportedBy: snorp@novell.com QAContact: qa@suse.de Found By: --- Created an attachment (id=281313) --> (https://bugzilla.novell.com/attachment.cgi?id=281313) config file passed to 'xm' I am having a difficult time trying to boot a 11.1 domU on a 11.1 dom0. When I try 'xm create -c <config>' I get the following: Started domain snorp_test Initializing cgroup subsys cpuset Initializing cgroup subsys cpu Linux version 2.6.27.19-3.2-xen (geeko@buildhost) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-02-25 15:40:44 +0100 Reserving virtual address space above 0xf5800000 Xen-provided physical RAM map: Xen: 0000000000000000 - 0000000020800000 (usable) last_pfn = 0x20800 max_arch_pfn = 0x10000000 NX (Execute Disable) protection: active RAMDISK: 005cb000 - 02ef0000 ACPI in unprivileged domain disabled 0MB HIGHMEM available. 520MB LOWMEM available. mapped low ram: 0 - 20800000 low ram: 00000000 - 20800000 bootmap 00000000 - 00004000 (4 early reservations) ==> bootmem [0000000000 - 0020000000] #0 [0000100000 - 00005ca65c] TEXT DATA BSS ==> [0000100000 - 00005ca65c] #1 [00005cb000 - 0002f8f000] Xen provided ==> [00005cb000 - 0002f8f000] #2 [0002f8f000 - 000307c000] PGTABLE ==> [0002f8f000 - 000307c000] #3 [0000000000 - 0000004000] BOOTMAP ==> [0000000000 - 0000004000] Zone PFN ranges: DMA 0x00000000 -> 0x00001000 Normal 0x00001000 -> 0x00020800 HighMem 0x00020800 -> 0x00020800 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0x00000000 -> 0x00020800 PERCPU: Allocating 35228 bytes of per cpu data Built 1 zonelists in Zone order, mobility grouping on. Total pages: 131950 Kernel command line: vga=0x314 splash=silent showopts bootsplash: silent mode. Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 PID hash table entries: 4096 (order: 12, 16384 bytes) Xen reported: 2327.498 MHz processor. Console: colour dummy device 80x25 console [tty0] enabled console [xvc-1] enabled Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Software IO TLB disabled Memory: 469756k/532480k available (2190k kernel code, 54172k reserved, 1882k data, 236k init, 0k highmem) virtual kernel memory layout: fixmap : 0xf5539000 - 0xf57ff000 (2840 kB) pkmap : 0xf5000000 - 0xf5200000 (2048 kB) vmalloc : 0xe1000000 - 0xf4ffe000 ( 319 MB) lowmem : 0xc0000000 - 0xe0800000 ( 520 MB) .init : 0xc0503000 - 0xc053e000 ( 236 kB) .data : 0xc0323a41 - 0xc04fa300 (1882 kB) .text : 0xc0100000 - 0xc0323a41 (2190 kB) Checking if this processor honours the WP bit even in supervisor mode...Ok. Calibrating delay using timer specific routine.. 4657.55 BogoMIPS (lpj=9315104) Security Framework initialized AppArmor: AppArmor initialized Mount-cache hash table entries: 512 Initializing cgroup subsys ns Initializing cgroup subsys cpuacct Initializing cgroup subsys memory Initializing cgroup subsys devices Initializing cgroup subsys freezer CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K Checking 'hlt' instruction... OK. SMP alternatives: switching to UP code Freeing SMP alternatives: 17k freed Brought up 1 CPUs net_namespace: 1044 bytes NET: Registered protocol family 16 Brought up 1 CPUs PCI: Fatal: No config space access function found PCI: setting up Xen PCI frontend stub ACPI: Interpreter disabled. Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI: disabled suspend: event channel 8 xen_mem: Initialising balloon driver. PCI: System does not support PCI PCI: System does not support PCI AppArmor: AppArmor Filesystem Enabled NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash table entries: 65536 (order: 7, 524288 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered NET: Registered protocol family 1 Unpacking initramfs... done Freeing initrd memory: 42132k freed platform rtc_cmos: registered platform RTC device (no PNP device found) audit: initializing netlink socket (disabled) type=2000 audit(1237822004.682:1): initialized VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) msgmni has been set to 276 Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254) io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) Xen virtual console successfully installed as xvc0 Event-channel device installed. PNP: No PS/2 controller found. Probing ports directly. i8042.c: No controller found. mice: PS/2 mouse device common for all mice TCP cubic registered Using IPI No-Shortcut mode registered taskstats version 1 XENBUS: Device with no driver: device/vbd/2049 XENBUS: Device with no driver: device/vif/0 XENBUS: Device with no driver: device/console/0 Freeing unused kernel memory: 236k freed Write protecting the kernel text: 2192k Write protecting the kernel read-only data: 1544k Uniform Multi-Platform E-IDE driver SCSI subsystem initialized st: Version 20080504, fixed bufsize 32768, s/g segs 256 md: stopping all md devices. xen console-0: xenbus_dev_shutdown: device/console/0: Initialising != Connected, skipping xen vif-0: xenbus_dev_shutdown: device/vif/0: Initialising != Connected, skipping xen vbd-2049: xenbus_dev_shutdown: device/vbd/2049: Initialising != Connected, skipping Restarting system. There is a long pause between the "st: Version" line and the "md:" one -- a minute or two I'd guess. The XENBUS messages earlier look like they might be relevant, but I have no idea what causes this. I've attached the config I'm using, and will post the disk image as well. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User snorp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c1 --- Comment #1 from James Willcox <snorp@novell.com> 2009-03-23 15:28:16 MST --- You can grab the disk image here: http://snorp.net/~snorp/disk.raw.gz -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User snorp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c2 --- Comment #2 from James Willcox <snorp@novell.com> 2009-03-23 15:49:25 MST --- Oops, the disk line is really: disk = [ 'tap:aio:/tmp/disk.raw,sda,w' ] I accidentally pasted a change I was experimenting with -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User jdouglas@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c3 Jason Douglas <jdouglas@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO CC| |carnold@novell.com, | |jdouglas@novell.com Info Provider| |snorp@novell.com AssignedTo|cgriffin@novell.com |jbeulich@novell.com QAContact|qa@suse.de |jdouglas@novell.com --- Comment #3 from Jason Douglas <jdouglas@novell.com> 2009-03-23 17:46:13 MST --- Obviously that is not a complete config file as there is no kernel/ramdisk or bootloader specified (among other things). Perhaps you could start by providing a full config file? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User jdouglas@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c5 Jason Douglas <jdouglas@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|snorp@novell.com | --- Comment #5 from Jason Douglas <jdouglas@novell.com> 2009-03-23 18:36:28 MST --- I have also duplicated this using the file: protocol (instead of tap:aio). I don't see any difference in behavior between the two protocols. Unfortunately, I didn't have loglvl=all / guest_loglvl=all set in my grub menu, so I am rebooting in case we can capture some interesting details from xm dmesg. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User jdouglas@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c6 --- Comment #6 from Jason Douglas <jdouglas@novell.com> 2009-03-23 18:46:35 MST --- (In reply to comment #5)
Unfortunately, I didn't have loglvl=all / guest_loglvl=all set in my grub menu, so I am rebooting in case we can capture some interesting details from xm dmesg.
There was nothing of interest in the xm dmesg output; just the typical messages: (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 00000000 (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 000000c0 (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 0000009f (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 00000000 (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 000000c0 (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 0000009f (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 00000000 (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 000000c0 (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 0000009f (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 00000000 (XEN) mm.c:676:d2 Non-privileged (2) attempt to map I/O space 000000c0 xend.log just indicated that the VM shutdown because of a reboot within the guest: [2009-03-23 18:41:19 8532] INFO (XendDomainInfo:1662) Domain has shutdown: name=snorp_test id=2 reason=reboot. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User snorp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c8 --- Comment #8 from James Willcox <snorp@novell.com> 2009-03-24 09:02:22 MST --- (In reply to comment #6) Surely xen is not just super broken in 11.1/SLES11 -- any idea what am I doing that is so different from everyone else? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c9 Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |snorp@novell.com --- Comment #9 from Jan Beulich <jbeulich@novell.com> 2009-03-24 09:33:39 MST --- Surely this isn't the case with SLE11? Therefore, did you try using SLE11's hypervisor, tools, and kernel? If that doesn't make a difference, it's very likely a problem somewhere in user mode (which unfortunately I probably can't help with). Jason, apart from the decompression error, there is another call trace in xend-debug.log, and many in xend.log? Are any of these related to the problem in any way? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User jdouglas@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c10 --- Comment #10 from Jason Douglas <jdouglas@novell.com> 2009-03-24 09:51:43 MST --- (In reply to comment #9)
Jason, apart from the decompression error, there is another call trace in xend-debug.log, and many in xend.log? Are any of these related to the problem in any way?
It appears to me as though all of those call traces occurred before I began debugging this, so I don't think so. Also, I had windows open watching the various log files (tail -f) while I duplicated the issue, and I didn't see anything suspicious. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User snorp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c11 --- Comment #11 from James Willcox <snorp@novell.com> 2009-03-24 12:36:33 MST --- (In reply to comment #9)
Surely this isn't the case with SLE11? Therefore, did you try using SLE11's hypervisor, tools, and kernel? If that doesn't make a difference, it's very likely a problem somewhere in user mode (which unfortunately I probably can't help with).
I tried with a SLES11 domU, and got the same result. I don't have a SLES11 dom0 here to test with, though. I am able to run other random xen appliances downloaded from the web, though, so it feels like a guest kernel issue to me. For instance the following appliance runs fine: http://virtualappliances.net/download/archive/VirtualAppliancesMySQL-xen-1.0... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c12 --- Comment #12 from Jan Beulich <jbeulich@novell.com> 2009-03-25 02:37:44 MST ---
I don't have a SLES11 dom0 here to test with, though.
But surely you could get ahold of the kernel and hypervisor packages of SLE11, and install just these. Jason - have you seen this in a pure SLE11 environment? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User jdouglas@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c13 --- Comment #13 from Jason Douglas <jdouglas@novell.com> 2009-03-25 17:58:17 MST --- Created an attachment (id=282099) --> (https://bugzilla.novell.com/attachment.cgi?id=282099) boot messages from failed boot with SLES 11 kernel/initrd (In reply to comment #12)
Jason - have you seen this in a pure SLE11 environment?
Let me start by saying that I have never seen this in any of my testing with VMs that I create using vm-install. That includes SLES 11 (32 & 64-bit), as well as openSUSE 11.1 guests. However, I have dropped the vmlinuz-xen and initrd-xen from a fresh install of a SLES 11 32-bit guest into this image, and while I don't get the same problem of the VM pausing and then rebooting, the VM does not finish booting. The attached file is the output from that boot (the snorp_test image with 32-bit SLES 11 kernel/initrd). Additionally, I don't see anything suspicious in xend.log, xm dmesg, or xend-debug.log with the SLES 11 kernel/initrd. Since this VM was obviously not created using vm-install (my guess is that it came from the SUSE Studio -- can you confirm that?), it might be helpful to know what is unique about this guest so that we can isolate what is causing the problem. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User jdouglas@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c14 Jason Douglas <jdouglas@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #282099|0 |1 is obsolete| | --- Comment #14 from Jason Douglas <jdouglas@novell.com> 2009-03-25 18:28:15 MST --- Created an attachment (id=282101) --> (https://bugzilla.novell.com/attachment.cgi?id=282101) boot messages from failed boot with SLES 11 kernel/initrd I just figured out what the problem was with the SLES 11 kernel/initrd. Because output is set by default to go the framebuffer, I wasn't getting all of the kernel messages, so I added xencons=tty as a kernel parameter, and I now get the attached output when booting to the SLES 11 kernel. The key section is this: <snip> Boot logging started on /dev/tty1(/dev/console) at Thu Mar 26 00:08:34 2009 Waiting for device /dev/xvda2 to appear: ..............................Could not find /dev/xvda2. Want me to fall back to /dev/xvda2? (Y/n) </snip> The initrd that I copied from SLES 11 was still expecting root to be located on xvda, but this config file specifies the disk as sda. When I did the same thing with the original openSUSE 11.1 kernel, I got a few additional lines of output. Here they are in context: <snip> Write protecting the kernel text: 2192k Write protecting the kernel read-only data: 1544k Loading KIWI VMX Boot-System... ------------------------------- Creating device nodes with udev Boot logging started on /dev/tty1(/dev/console) at Thu Mar 26 00:20:00 2009 ===> Boot-Logging enabled on /dev/tty3 ===> Kernel logging enabled on: /dev/tty4 ===> Including required kernel modules... Uniform Multi-Platform E-IDE driver SCSI subsystem initialized st: Version 20080504, fixed bufsize 32768, s/g segs 256 ===> Searching for boot device... md: stopping all md devices. xen console-0: xenbus_dev_shutdown: device/console/0: Initialising != Connected, skipping xen vif-0: xenbus_dev_shutdown: device/vif/0: Initialising != Connected, skipping xen vbd-2048: xenbus_dev_shutdown: device/vbd/2048: Initialising != Connected, skipping Restarting system. </snip> The long pause occurs while the system is "===> Searching for boot device...". -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User snorp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c15 James Willcox <snorp@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Info Provider|snorp@novell.com |jdouglas@novell.com --- Comment #15 from James Willcox <snorp@novell.com> 2009-03-25 20:07:42 MST --- Aha! That "Searching for boot device" is from the kiwi-generated initrd. So something is wrong in there. I wonder why I can't see that? I was using 'xencons=tty' and the last message I saw before the pause was the 'st: Version' one. I would have instantly realized what was going on if I had. Trying again just now, I still don't see it -- very odd. Any idea what could be happening there? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=487932 User snorp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=487932#c16 James Willcox <snorp@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|jdouglas@novell.com | Resolution| |INVALID --- Comment #16 from James Willcox <snorp@novell.com> 2009-03-25 20:14:39 MST --- Sigh. Nevermind. I was expecting the 'extra' config value to work, but of course it doesn't when you are using pygrub. Putting 'xencons=tty' into the grub conf shows the kernel messages now. Closing this as INVALID, thanks for all the help! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com