[Bug 623680] New: xen kernel freezes during boot ("usbcore: registering new driver usb")
http://bugzilla.novell.com/show_bug.cgi?id=623680 http://bugzilla.novell.com/show_bug.cgi?id=623680#c0 Summary: xen kernel freezes during boot ("usbcore: registering new driver usb") Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: i686 OS/Version: Other Status: NEW Severity: Critical Priority: P5 - None Component: Xen AssignedTo: jdouglas@novell.com ReportedBy: martin.wilck@ts.fujitsu.com QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.6) Gecko/20100626 SUSE/3.6.6-1.2 Firefox/3.6.6 The xen kernel freezes during boot. The last message is "usbcore: registering new driver usb". After that, the system doesn't react to any keystroke including NumLock. In particular, no AltSysrq is possible. Reproducible: Always Steps to Reproduce: 1. Install on Samsung XS50 laptop 2. Boot xen + xen kernel 3. Actual Results: see description Expected Results: No hang I am 90% positive that I was able to boot XEN with beta1 snap7. But I don't have that installation any more, so I can't check any more. The default kernel shows no problems at the stage where the xen kernel halts. The dmesg of the default kernel around this point are: [ 2.473025] ata1: SATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0x18e0 irq 14 [ 2.473028] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x18e8 irq 15 [ 2.475625] rtc_cmos 00:07: rtc core: registered rtc_cmos as rtc0 [ 2.475661] rtc0: alarms up to one month, y3k, 242 bytes nvram, hpet irqs [ 2.476179] usbcore: registered new device driver usb [ 2.493139] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 2.493178] alloc irq_desc for 23 on node -1 [ 2.493181] alloc kstat_irqs on node -1 [ 2.493189] ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 23 (level, low) -> IRQ 23 [ 2.493206] ehci_hcd 0000:00:1d.7: setting latency timer to 64 [ 2.493210] ehci_hcd 0000:00:1d.7: EHCI Host Controller So ehci, ata or rtc are potential suspects. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=623680
http://bugzilla.novell.com/show_bug.cgi?id=623680#c1
--- Comment #1 from Martin Wilck
http://bugzilla.novell.com/show_bug.cgi?id=623680
http://bugzilla.novell.com/show_bug.cgi?id=623680#c2
--- Comment #2 from Martin Wilck
http://bugzilla.novell.com/show_bug.cgi?id=623680
http://bugzilla.novell.com/show_bug.cgi?id=623680#c
Daniel Rahn
http://bugzilla.novell.com/show_bug.cgi?id=623680
http://bugzilla.novell.com/show_bug.cgi?id=623680#c
Charles Arnold
http://bugzilla.novell.com/show_bug.cgi?id=623680
http://bugzilla.novell.com/show_bug.cgi?id=623680#c3
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c4
Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c5
--- Comment #5 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c6
Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c7
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c8
--- Comment #8 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c9
--- Comment #9 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c10
--- Comment #10 from Jan Beulich
With "watchdog", the system automatically reboots.
Even with "noreboot"?
The system comes up with "processor.max_cstate=1".
Does the same also hold when passing "max_cstate=1" to Xen? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c11
--- Comment #11 from Martin Wilck
With "watchdog", the system automatically reboots.
Even with "noreboot"?
No, but I haven't succeeded to get any XEN messages on the console when the system freezes. "vga=mode-0x314,keep" should be correct for me, shouldn't it? Btw the SUSE menu.lst default XEN option "vgamode=0x314" is wrong, it should be "vga=mode-0x314" instead.
Does the same also hold when passing "max_cstate=1" to Xen?
Yes. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c12
--- Comment #12 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c13
--- Comment #13 from Jan Beulich
"vga=mode-0x314,keep" should be correct for me, shouldn't it?
Yes, that's how it is supposed to be.
Btw the SUSE menu.lst default XEN option "vgamode=0x314" is wrong, it should be "vga=mode-0x314" instead.
That would need to be a separate (YaST) bug report. (In reply to comment #12)
This screenshot was made with "vga=mode-0x31A,keep watchdog noreboot" for XEN, and "debug sysrq=9" for the kernel. I also activated ACPI debugging for the processor module.
I'll look at this in more detail later. Could you attach the corresponding native kernel boot's boot.msg for reference?
Is there a way to force udev to load drivers serially at boot time? I tried udev.log_priority=debug udev.children_max=16 on the kernel command line, but it had no effect.
What does udev have to do with this problem? Supposedly you have processor.ko in the list of modules to be loaded explicitly from initrd (i.e. its loading isn't controlled by udev). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c14
--- Comment #14 from Martin Wilck
(In reply to comment #11) What does udev have to do with this problem? Supposedly you have processor.ko in the list of modules to be loaded explicitly from initrd (i.e. its loading isn't controlled by udev).
I had removed processor from INITRD_MODULES. That made it possible for me to boot e.g. with "init=/bin/bash". If I do a normal boot with this initrd, the system freezes later when udev loads drivers. Now that I know I can boot with max_cstate=1, I can put processor.ko back into INITRD_MODULES if you prefer. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c15
--- Comment #15 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c16
--- Comment #16 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c17
--- Comment #17 from Jan Beulich
Now that I know I can boot with max_cstate=1, I can put processor.ko back into INITRD_MODULES if you prefer.
Depends on whether keeping it the way you have it now makes debugging easier. Not sure whether adding the module to MODULES_LOADED_ON_BOOT would make the loading of the module (indirectly) controllable - boot.loadmodules runs after boot.udev, but a comment there says its purpose is to control load order. If that works, simply turning on interactive startup mode (PROMPT_FOR_CONFIRM="yes" in /etc/sysconfig/boot) would allow you to separate the individual steps. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c18
--- Comment #18 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c
Ihno Krumreich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c19
--- Comment #19 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c20
--- Comment #20 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c21
--- Comment #21 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c22
--- Comment #22 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c23
Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c24
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c25
--- Comment #25 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c26
--- Comment #26 from Jan Beulich
I was hoping to get some debugging hints here.
I would really want to give you some, but for cpuidle specific ones I can't really (not knowing the code well enough), and anything generic would involve having serial console access. This is why reproducing against xen-unstable and, if reproducible, reporting on xen-devel is likely the only realistic chance. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c27
--- Comment #27 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c28
--- Comment #28 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c29
--- Comment #29 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c30
--- Comment #30 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c31
--- Comment #31 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c32
--- Comment #32 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c33
--- Comment #33 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c34
--- Comment #34 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c35
--- Comment #35 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c36
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c37
Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c38
--- Comment #38 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c39
--- Comment #39 from Charles Arnold
Charles, could you do a 32-bit 11.4 one-off debugging build with the above patch included for Martin?
32bit RPMs are available for download at, ftp://ftp.novell.com/forge/XenTechnicalPreview/openSUSE/11.4/debug/623680/ Please download and install all three RPMs (xen, xen-libs, and xen-tools). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c40
--- Comment #40 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c41
--- Comment #41 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c42
Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c
Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c43
--- Comment #43 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c44
--- Comment #44 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c45
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c46
Charles Arnold
Charles, could you do another 32-bit 11.4 one-off test build with the above patch included (but for the moment without the earlier debugging patch, as I'm not certain the two would both apply when used together) for Martin?
The second version of the 32bit RPMs are available for download at, ftp://ftp.novell.com/forge/XenTechnicalPreview/openSUSE/11.4/debug/623680/ Please download and install all three RPMs (xen, xen-libs, and xen-tools). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c47
Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c48
--- Comment #48 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c49
--- Comment #49 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c50
--- Comment #50 from Jan Beulich
With this fix, the XEN hypervisor stops after printing (XEN) TSC: 0:0:0 (XEN) TSC: 000b48cf9339:000048cf9394:000048cf937c (XEN) TSC only partially writable
Very bad, and completely unexpected. On my dual PentiumIII system, this works just fine (just that it doesn't have any C states, and hence the actual intended effect can't be verified), so for the moment it escapes me why it would hang for you. Will need to add some more debugging code to see whether we can spot where it actually dies (I expect this is not immediately after the printed message). (In reply to comment #49)
Jan, if I'm reading your patch correctly, it would simply disable C2 and higher on my system (if it worked). But I could do the same easier with the max_cstate=1 parameter, as we discovered early on.
Yes, but the goal is to have Xen notice this by itself instead of needing a command line option, the more that we're dealing with fully specified behavior here.
OTOH, the Linux kernel runs just fine on this system with C2 and C3. What is the deeper reason behind Xen's need to write the TSC when C2/C3 are enabled?
The reason for this lies in how Xen manages time, which is completely different from Linux. Making C2/C3 usable under Xen on this systems of yours is certainly theoretically doable, but I see no point in spending time here - you'll have to acknowledge your system is not the youngest anymore ;-) . -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c51
--- Comment #51 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c52
--- Comment #52 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c53
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c54
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c55
Charles Arnold
Charles, could you do yet another 32-bit 11.4 one-off test build with the above patch included? I'm sorry for the hickup with the previous one.
The next version of the 32bit RPMs are available for download at, ftp://ftp.novell.com/forge/XenTechnicalPreview/openSUSE/11.4/debug/623680/ Please download and install all three RPMs (xen, xen-libs, and xen-tools). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c56
Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c57
--- Comment #57 from Martin Wilck
The reason for this lies in how Xen manages time, which is completely different from Linux. Making C2/C3 usable under Xen on this systems of yours is certainly theoretically doable, but I see no point in spending time here - you'll have to acknowledge your system is not the youngest anymore ;-) .
Hm, the whole point of this bug report (for me) was to enable C2/C3 on this machine :-( If I am reading the code correctly, the main difference between Xen and Linux is that Linux completely avoids using the TSC where it is unstable or otherwise unreliable. Xen, OTOH, always uses TSC, and just uses the "platform timer" to calibrate and adjust TSC when it isn't stable. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c58
--- Comment #58 from Martin Wilck
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c59
--- Comment #59 from Jan Beulich
Hm, the whole point of this bug report (for me) was to enable C2/C3 on this machine :-(
Sort of contrary to the bug title...
If I am reading the code correctly, the main difference between Xen and Linux is that Linux completely avoids using the TSC where it is unstable or otherwise unreliable. Xen, OTOH, always uses TSC, and just uses the "platform timer" to calibrate and adjust TSC when it isn't stable.
Correct. (In reply to comment #58)
Just a very crude idea: On CPUs such as mine, where writing the TSC zeroes out the upper 32bits, wouldn't it be possible to treat the TSC as a 32bit timer only? Similar code is already in place for the 32bit and 24bit platform timers.
Here you sort of contradict your own observations noted in the previous comment: The platform timer is used for calibration, and an overflow timer is used as a helper to deal with it being narrow. The TSC, otoh, is the main timer, and hence treating it as a 32-bit counter would involve quite some more changes (but it's certainly doable afaict). Since the (64-bit) TSC is part of the hypervisor/kernel interface, running *all* guests (including Dom0) with emulated rdtsc would be a direct consequence. In any case, your machine not dying anymore is where we'll have to end this. Enabling C2/C3 on such systems would (as far as we're concerned) presumably be an enhancement request, not a bug report, and hence would be expected to get addressed upstream first. So if you can talk anyone into doing this work, or if you want to give this a try yourself... One thing you might try though is whether at least C2 can be used on your system ("lapic_timer_c2_ok" on the Xen command line if the APIC timer doesn't stop in C2, similar to the identical native Linux option). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c60
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c61
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=623680
https://bugzilla.novell.com/show_bug.cgi?id=623680#c62
Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com