[Bug 825510] New: "shutdown -r now" fails @ Xen 4.3.0 Dom0 -- systems halts, but does not restart
https://bugzilla.novell.com/show_bug.cgi?id=825510 https://bugzilla.novell.com/show_bug.cgi?id=825510#c0 Summary: "shutdown -r now" fails @ Xen 4.3.0 Dom0 -- systems halts, but does not restart Classification: openSUSE Product: openSUSE 12.3 Version: Final Platform: x86-64 OS/Version: openSUSE 12.3 Status: NEW Severity: Major Priority: P5 - None Component: Xen AssignedTo: jdouglas@suse.com ReportedBy: ar16@imapmail.org QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0 per advice @ https://bugzilla.novell.com/show_bug.cgi?id=825501#c1, splitting this here, I'm running Xen 4.3.0 on Opensuse 12.3, rpm -qa | grep -i ^xen xen-devel-4.3.0_03-251.1.x86_64 xen-4.3.0_03-251.1.x86_64 xen-tools-4.3.0_03-251.1.x86_64 xen-libs-4.3.0_03-251.1.x86_64 lsb_release -a LSB Version: n/a Distributor ID: openSUSE project Description: openSUSE 12.3 (x86_64) Release: 12.3 Codename: Dartmouth uname -rm 3.7.10-1.11-xen x86_64 if booted @ kernel-xen Dom0, exec'ing a `shutdown -r now` fails -- the system halts, but does not restart. if booted @ kernel-default, seems to work as expected. I'd tested this system and it working without such problems ~ 2 weeks ago. I supsect, but have not yet tracked down, an update within that timeframe. I've attached systemd status & dmesg after boot @ the referenced bug, https://bugzilla.novell.com/attachment.cgi?id=544545 Not clear if that's diagnostic is sufficient. Can provide add'l info @ request. Reproducible: Always Steps to Reproduce: 1. 2. 3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=825510 https://bugzilla.novell.com/show_bug.cgi?id=825510#c2 Jan Beulich <jbeulich@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P4 - Low Status|NEW |NEEDINFO Found By|--- |Community User InfoProvider| |ar16@imapmail.org --- Comment #2 from Jan Beulich <jbeulich@suse.com> 2013-06-19 01:27:41 UTC --- First of all we'd need a serial log taken of the shutdown operation, so we can see whether kernel or hypervisor crash in any way, or how far shutdown proceeds. Does normal shutdown work, or does it also halt the machine without turning it off? And then, with 12.3 not shipping with Xen 4.3, we'd want you to test with the shipped version of Xen (and, in case you updated that too, kernel). Further, with you apparently knowing that it worked before a recent update, narrowing down which update this was would also help. And finally, with the native kernel working, attaching the boot log of the native kernel (to see eventual log messages regarding applied workarounds) would be as helpful as providing exact hardware details (namely DMI information). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=825510 https://bugzilla.novell.com/show_bug.cgi?id=825510#c3 A R <ar16@imapmail.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|ar16@imapmail.org | --- Comment #3 from A R <ar16@imapmail.org> 2013-06-19 09:10:06 UTC --- I can provide all that information; Before I do ... I'd moved long ago to Xen 4.3.0, based on discussions/support @Xen team. Xen < v4.3 is no longer an option here; I understand the ramifications of that. If you'd still like me to work here on helping to test in this env, I can. Otherwise, it might be prudent to x-fer this over to Xen category, for now. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=825510 https://bugzilla.novell.com/show_bug.cgi?id=825510#c4 Jan Beulich <jbeulich@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |ar16@imapmail.org --- Comment #4 from Jan Beulich <jbeulich@suse.com> 2013-06-20 01:47:42 UTC --- If we can, we'd certainly like to address this. Priorities might end up being different when using a custom mix vs what is being shipped (as you may already have noticed by it having got set to P4 for now). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=825510 https://bugzilla.novell.com/show_bug.cgi?id=825510#c5 A R <ar16@imapmail.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|ar16@imapmail.org | --- Comment #5 from A R <ar16@imapmail.org> 2013-06-27 12:59:23 UTC --- OK, stepwise ... For this env, uname -a Linux xen03.loc 3.7.10-1.16-xen #1 SMP Fri May 31 20:21:23 UTC 2013 (97c14ba) x86_64 x86_64 x86_64 GNU/Linux lsb_release -a LSB Version: n/a Distributor ID: openSUSE project Description: openSUSE 12.3 (x86_64) Release: 12.3 Codename: Dartmouth rpm -qa | grep -i xen kernel-xen-3.7.10-1.16.1.x86_64 xen-devel-4.3.0_05-255.1.x86_64 kernel-xen-devel-3.7.10-1.16.1.x86_64 xen-libs-4.3.0_05-255.1.x86_64 xen-4.3.0_05-255.1.x86_64 xen-tools-4.3.0_05-255.1.x86_64 booting with the following Grub config (used for "ages"): title Xen root (hd0,0) kernel /xen.gz vga=gfx-1280x1024x16 conring_size=64 systemd.log_level=debug systemd.log_target=syslog-or-kmsg loglvl=debug guest_loglvl=debug com1=57600,8n1,pci console=vga,com1 console_timestamps dom0_mem=1024M,max:1024M dom0_vcpus_pin=true dom0_max_vcpus=4 sched=credit apic_verbosity=verbose iommu=verbose cpuidle=1 cpufreq=xen clocksource=acpi numa=on cpuf module /vmlinuz-xen log_buf_len=4M console=tty0 console=xvc0,57600n8 xencons=tty earlyprintk=xen vga=0x31a nomodeset=0 root=/dev/VG0/ROOT rootfstype=ext4 rootflags=journal_checksum noresume showopts selinux=0 SELINUX_INIT=NO apparmor=0 elevator=cfq clocksource=xen mce=off noquiet module /initrd-xen of course, let me know if a different set of parameters is more useful/helpful.
we'd need a serial log taken of the shutdown operation, so we can see whether kernel or hypervisor crash in any way, or how far shutdown proceeds.
@ `shutdown -r now`, tail of what i *think* is needed/relevant at serial console output: --------------------------- ... [ OK ] Reached target Unmount All Filesystems. [ OK ] Stopped target Local File Systems (Pre). Stopping Remount Root and Kernel File Systems... [ OK ] Stopped Remount Root and Kernel File Systems. Starting Save Random Seed... Starting Update UTMP about System Shutdown... Stopping Replay Read-Ahead Data... [ OK ] Stopped Replay Read-Ahead Data. Stopping Collect Read-Ahead Data... [ OK ] Stopped Collect Read-Ahead Data. Stopping LSB: Start LVM2... [ OK ] Started Save Random Seed. [ OK ] Started Update UTMP about System Shutdown. [ OK ] Stopped LSB: Start LVM2. Stopping LSB: Multiple Device RAID... [ OK ] Stopped LSB: Multiple Device RAID. [ OK ] Reached target Shutdown. (XEN) [2013-06-27 18:04:39] mm.c:618:d0 Could not get page ref for pfn fec00 (XEN) [2013-06-27 18:04:39] mm.c:618:d0 Could not get page ref for pfn fec00 (XEN) [2013-06-27 18:04:39] mm.c:618:d0 Could not get page ref for pfn fec00 (XEN) [2013-06-27 18:04:39] mm.c:618:d0 Could not get page ref for pfn fec00 (XEN) [2013-06-27 18:04:39] mm.c:618:d0 Could not get page ref for pfn fec00 (XEN) [2013-06-27 18:04:39] mm.c:618:d0 Could not get page ref for pfn fec00 (XEN) [2013-06-27 18:04:39] mm.c:618:d0 Could not get page ref for pfn fec00 (XEN) [2013-06-27 18:04:39] mm.c:618:d0 Could not get page ref for pfn fec00 Sending SIGTERM to remaining processes... Sending SIGKILL to remaining processes... (XEN) [2013-06-27 18:04:43] mm.c:618:d0 Could not get page ref for pfn fec00 (XEN) [2013-06-27 18:04:43] mm.c:618:d0 Could not get page ref for pfn fec00 Hardware watchdog 'SP5100 TCO timer', version 0 (XEN) [2013-06-27 18:04:43] mm.c:618:d0 Could not get page ref for pfn fec00 Unmounting file systems. Unmounting /var/lib/dhcp/proc. Unmounting /var/run. Unmounting /dev/mqueue. All filesystems unmounted. Deactivating swaps. All swaps deactivated. Detaching loop devices. All loop devices detached. Detaching DM devices. Detaching DM 253:7. Detaching DM 253:6. Detaching DM 253:5. Detaching DM 253:4. Detaching DM 253:3. Detaching DM 253:2. Detaching DM 253:0. Not all DM devices detached, 1 left. (XEN) [2013-06-27 18:04:43] mm.c:618:d0 Could not get page ref for pfn fec00 Detaching DM devices. Not all DM devices detached, 1 left. Cannot finalize remaining file systems and devices, giving up. (XEN) [2013-06-27 18:04:45] mm.c:618:d0 Could not get page ref for pfn fec00 [ 1399.988852] Restarting system. --------------------------- at this point it just sits, and goes no further. the system does NOT poweroff.
Does normal shutdown work, or does it also halt the machine without turning it off?
manual/cold reboot, then @ `shutdown -h now`, it *DOES* successfully poweroff. Here's the similar, serial console tail: --------------------------- [ OK ] Reached target Unmount All Filesystems. [ OK ] Stopped target Local File Systems (Pre). Stopping Remount Root and Kernel File Systems... [ OK ] Stopped Remount Root and Kernel File Systems. Starting Save Random Seed... Starting Update UTMP about System Shutdown... Stopping Replay Read-Ahead Data... [ OK ] Stopped Replay Read-Ahead Data. Stopping Collect Read-Ahead Data... [ OK ] Stopped Collect Read-Ahead Data. Stopping LSB: Start LVM2... [ OK ] Started Save Random Seed. [ OK ] Started Update UTMP about System Shutdown. [ OK ] Stopped LSB: Start LVM2. Stopping LSB: Multiple Device RAID... [ OK ] Stopped LSB: Multiple Device RAID. [ OK ] Reached target Shutdown. Sending SIGTERM to remaining processes... Sending SIGKILL to remaining processes... Unmounting file systems. Unmounting /var/lib/dhcp/proc. Unmounting /var/lib/nfs/rpc_pipefs. Unmounting /var/run. Unmounting /dev/mqueue. All filesystems unmounted. Deactivating swaps. All swaps deactivated. Detaching loop devices. All loop devices detached. Detaching DM devices. Detaching DM 253:7. Detaching DM 253:6. Detaching DM 253:5. Detaching DM 253:4. Detaching DM 253:3. Detaching DM 253:2. Detaching DM 253:0. Not all DM devices detached, 1 left. Detaching DM devices. Not all DM devices detached, 1 left. Cannot finalize remaining file systems and devices, giving up. (XEN) [2013-06-27 18:16:05] mm.c:618:d0 Could not get page ref for pfn fec00 [ 256.208915] Power down. (XEN) [2013-06-27 18:16:07] Preparing system for ACPI S5 state. (XEN) [2013-06-27 18:16:07] Disabling non-boot CPUs ... (XEN) [2013-06-27 18:16:07] Breaking affinity for d0v1 (XEN) [2013-06-27 18:16:07] Breaking affinity for d0v2 (XEN) [2013-06-27 18:16:08] Breaking affinity for d0v3 (XEN) [2013-06-27 18:16:08] Entering ACPI S5 state. --------------------------- and, at this point, it's successfully powered-off.
with 12.3 not shipping with Xen 4.3, we'd want you to test with the shipped version of Xen (and, in case you updated that too, kernel).
Pending
with you apparently knowing that it worked before a recent update, narrowing down which update this was would also help.
Pending
with the native kernel working, attaching the boot log of the native kernel (to see eventual log messages regarding applied workarounds) would be as helpful as providing exact hardware details (namely DMI information).
not entirely sure what 'boot log' is being asked for in a systemd world, since boot.*msg no longer appears. here's `journalctl -b | grep -i kernel`: http://pastebin.com/raw.php?i=khT1Da6T -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=825510 https://bugzilla.novell.com/show_bug.cgi?id=825510#c6 Jan Beulich <jbeulich@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |ar16@imapmail.org --- Comment #6 from Jan Beulich <jbeulich@suse.com> 2013-06-28 04:18:42 UTC --- So with it not crashing, but just hanging, issuing the 'd' debug key from the serial console ought to still work, and should give us insight into what's going on. To simplify the analysis, it would be desirable if you tried this with disabled secondary CPUs ("nosmp" on the Xen command line). As to the pending bits of information - please don't clear needinfo unless you provided all information that you were asked for. Finally, please attach logs here rather than pointing to external locations (where e.g. it is not clear whether or when they would vanish). Irrespective of that, the provided native kernel log doesn't immediately hint at any particular workaround the kernel would apply. That said - did you try the various "reboot=" hypervisor command line options, and _none_ of them worked? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=825510 https://bugzilla.novell.com/show_bug.cgi?id=825510#c7 --- Comment #7 from A R <ar16@imapmail.org> 2013-06-28 10:56:52 UTC ---
To simplify the analysis, it would be desirable if you tried this with disabled secondary CPUs ("nosmp" on the Xen command line).
I've found different advice on how best to achieve this. Reading, http://osdir.com/ml/xen-users/2007-11/msg00697.html > Is it possible to disable the SMP function on the dom0 and assign each single domU to have direct access on the multicore processor like Intel Quadcore? "Yes, Edit /etc/xend/xend-config.sxp and use (dom0-cpus x) to assign a single cpu to dom0" and http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637308#22 > it’s also recommended that the dom0 is restricted to one > CPU only, for example by booting with the kernel parameter nosmp." Ian Campbell ==> "Ignoring whether or not this is good advice (I expect it very much depends on your workload) this can be better achieved by adding dom0_max_vcpus=1 to your hypervisor command line or by hot-unplugging vCPUS once the system has booted ..." mod'ing in my grub - ... dom0_max_vcpus=4 ... + ... dom0_max_vcpus=1 ... iiuc, appears to do the trick; after reboot, just one CPU appears active cat /proc/interrupts | head -n 5 CPU0 1: 4 Phys-fasteoi i8042 6: 3 Phys-fasteoi floppy 7: 0 Phys-fasteoi parport0 8: 0 Phys-fasteoi rtc0 Is this sufficient for your request?
So with it not crashing, but just hanging, issuing the 'd' debug key from the serial console ought to still work, and should give us insight into what's going on.
Atm, when booting to Xen, I'm *unable* to get the server to recognize any commands issued from my serial terminal (no cmd keys, not even seeing a login prompt, etc). This *used* to work pre-systemd. Something's possibly off in my serial config. I'm trying to get *that* straightened out @ http://lists.opensuse.org/opensuse-virtual/2013-06/msg00015.html
That said - did you try the various "reboot=" hypervisor command line options, and _none_ of them worked?
I hadn't tried any of them; tbh, not even aware of them. Reading http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html The options appear to be: reboot = b[ios] | t[riple] | k[bd] | n[o] [, [w]arm | [c]old] Default: 0 Specify the host reboot method. warm instructs Xen to not set the cold reboot flag. cold instructs Xen to set the cold reboot flag. bios instructs Xen to reboot the host by jumping to BIOS. This is only available on 32-bit x86 platforms. triple instructs Xen to reboot the host by causing a triple fault. kbd instructs Xen to reboot the host via the keyboard controller. acpi instructs Xen to reboot the host using RESET_REG in the ACPI FADT. with no further explanation(s). Reading http://lists.xen.org/archives/html/xen-devel/2011-09/msg00942.html suggests that @ 'Default', a *sequence* of the reboot options is attempted "Summing up, both Linux 3.1 and Xen 4.1 both do the following sequence by default: ACPI, KBD, ACPI, KBD, TRIPLE, KBD, TRIPLE, KBD, ..." setting, instead, individual reboot= grub options, testing `shutdown -r now` in each case reboot=cold reboot=warm reboot=triple reboot=kbd reboot=acpi (reboot=bios, not applicable. this is x86_64.) exec of `shutdown -r now` hangs, as reported above, @ (XEN) [2013-06-28 16:32:06] Domain 0 shutdown: rebooting machine. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=825510 https://bugzilla.novell.com/show_bug.cgi?id=825510#c8 --- Comment #8 from Jan Beulich <jbeulich@suse.com> 2013-07-01 08:44:16 UTC --- (In reply to comment #7)
+ ... dom0_max_vcpus=1 ... Is this sufficient for your request?
No, it's not. This - as the name says - restricts Dom0's number of vCPU-s, not the number of pCPU-s that Xen uses. Yet the latter is what I'd like to be restricted. Also - as you mention it, and regardless of how much I dislike you scattering all sorts of information here - restricting Dom0's number of vCPU-s to 1 is (at the very least with using xend) _not_ recommended, regardless of what you may have found elsewhere.
Atm, when booting to Xen, I'm *unable* to get the server to recognize any commands issued from my serial terminal (no cmd keys, not even seeing a login prompt, etc). This *used* to work pre-systemd.
Sorry, but the expectation is that you have this working. And I very much doubt that systemd has any effect on Xen receiving input (it may very well have an effect on Dom0 receiving input, but that's two different modes to run the serial console in).
"Summing up, both Linux 3.1 and Xen 4.1 both do the following sequence by default:
ACPI, KBD, ACPI, KBD, TRIPLE, KBD, TRIPLE, KBD, ..."
setting, instead, individual reboot= grub options, testing `shutdown -r now` in each case
reboot=cold reboot=warm reboot=triple reboot=kbd reboot=acpi (reboot=bios, not applicable. this is x86_64.)
exec of `shutdown -r now` hangs, as reported above, @
Very interesting (and odd). And you added these to the hypervisor command line, not the kernel one? If so, you might want to try "reboot=pci", if you're able to rebuild the hypervisor for yourself with the patch at http://lists.xenproject.org/archives/html/xen-devel/2013-06/msg02128.html applied. Failing that, we will need to see the result of the 'd' debug key as per #6. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=825510 https://bugzilla.novell.com/show_bug.cgi?id=825510#c9 A R <ar16@imapmail.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |CLOSED InfoProvider|ar16@imapmail.org | Resolution| |NORESPONSE --- Comment #9 from A R <ar16@imapmail.org> 2013-07-10 22:00:40 UTC --- . -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=825510 https://bugzilla.novell.com/show_bug.cgi?id=825510#c10 --- Comment #10 from Jan Beulich <jbeulich@suse.com> 2013-07-11 08:24:19 UTC --- I'm confused: Are you not interested in getting this fixed anymore? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com