https://bugzilla.novell.com/show_bug.cgi?id=370872
Summary: "pccardctl eject" hangs in state D Product: openSUSE 11.0 Version: Alpha 2plus Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: seife@novell.com QAContact: qa@suse.de CC: fseidel@novell.com Found By: Development
During suspend, one thing we do is eject PCMCIA card with "pccardctl eject".
I quite frequently see that command hanging in state D, which makes suspend fail. Attempting to shut down the machine cleanly lead to a hang when the sound modules are unloaded, a clean shutdown does only happen if i do a sysrq-e to kill the hanging processes and make init proceed.
I have seen this with various 3G/UMTS CardBus cards, both USB-serial based and nozomi based, so i believe this to be a generic PCMCIA issue.
Will attach sysrq-T
https://bugzilla.novell.com/show_bug.cgi?id=370872
User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c1
--- Comment #1 from Stefan Seyfried seife@novell.com 2008-03-14 03:55:09 MST --- Created an attachment (id=201699) --> (https://bugzilla.novell.com/attachment.cgi?id=201699) sysrq-t
sysrq-t, this is while it was hanging during shutdown at removing the alsasound modules.
You'll find a hanging "modprobe -r uhci_hcd" from the suspend try (after i killed the script surroundign pccardctl eject) and a hanging "rmmod" as well as the hanging pccardctl.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User fseidel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c2
Frank Seidel fseidel@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED
--- Comment #2 from Frank Seidel fseidel@novell.com 2008-03-14 06:29:02 MST --- It doesn't seem to happen on all pccards (with my nozomi e.g. it doesn't) and it isn't needed to suspend the machine for this to happen. A simple pccardctl eject (or even alone the write to /sys/class/pcmcia_socket/pcmcia_socket?/card_eject) also triggers this.
https://bugzilla.novell.com/show_bug.cgi?id=370872
Frank Seidel fseidel@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel-maintainers@forge.provo.novell.com |fseidel@novell.com
https://bugzilla.novell.com/show_bug.cgi?id=370872
User fseidel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c3
--- Comment #3 from Frank Seidel fseidel@novell.com 2008-03-14 11:48:56 MST --- just a very small update: this is definitely a usb issue. so this will only occur on pccardctl eject of usb host controller (containing) pccards.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User fseidel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c4
--- Comment #4 from Frank Seidel fseidel@novell.com 2008-03-19 09:30:41 MST --- Created an attachment (id=202950) --> (https://bugzilla.novell.com/attachment.cgi?id=202950) lspci of usb cardbus card
with this hardware i could reproduce this every time i did "pccardctl eject" or a echo 1 >/sys/.../card_eject
https://bugzilla.novell.com/show_bug.cgi?id=370872
User fseidel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c5
--- Comment #5 from Frank Seidel fseidel@novell.com 2008-03-19 09:31:20 MST --- Created an attachment (id=202952) --> (https://bugzilla.novell.com/attachment.cgi?id=202952) lsusb of this card
https://bugzilla.novell.com/show_bug.cgi?id=370872
User fseidel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c6
--- Comment #6 from Frank Seidel fseidel@novell.com 2008-03-19 09:31:49 MST --- Created an attachment (id=202953) --> (https://bugzilla.novell.com/attachment.cgi?id=202953) lsusb -v of this card
https://bugzilla.novell.com/show_bug.cgi?id=370872
User fseidel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c8
--- Comment #8 from Frank Seidel fseidel@novell.com 2008-03-19 10:02:26 MST --- I'll try to describe here to where i tracked this down.
In short: on the removal of the first child device (usb_disconnect in drivers/usb/core/hub.c) he runs in usb_disable_endpoint(dev,i+USB_DIR_IN) of the second endpoint (i==1) to usb_kill_urb() where he comes to wait_event(usb_kill_urb_queue..) und starves there forever. When issueing the "pccardctl eject" no application is accessing it anymore.
How i came there (line numbers will differ because of my additional debugcode, but the order of function names and calls should make it clear):
# pccardctl eject ->echo 1 > /sys/class/pcmcia_socket/pcmcia_socket0/card_eject ->drivers/pcmcia/socket_sysfs.c:143: pcmcia_eject_card() ->drivers/pcmcia/cs.c:895: socket_remove() -> :609: socket_shutdown() -> :415: cb_free() ->drivers/pcmcia/cardbus.c:256: pci_remove_behind_bridge() ->drivers/pci/remove.c:135: pci_remove_bus_device() in first run of for_each loop -> :111: pci_destroy_dev() -> :40: pci_stop_dev() -> :29: device_unregister() ->drivers/base/core.c:1002: device_del() -> :955: bus_remove_device() ->drivers/base/bus.c:543: device_release_driver() ->drivers/base/dd.c:337: __device_release_driver() -> :307: dev->bus->remove(dev) ->drivers/usb/core/hcd-pci.c:169: usb_hcd_pci_remove() -> :179: usb_remove_hcd() ->drivers/usb/core/hcd.c:1970: usb_disconnect() -> :1194: usb_disconnect() 1st loop run for children (devnum==2) -> :1204: usb_disable_device() ->drivers/usb/core/message.c: (1076:dev_dbg() => "usb 8-1: usb_disable_device nuking all URBs") 1082: usb_disable_endpoint(dev,i+USB_DIR_IN) with i==1 (so its 2nd loop run) ->drivers/usb/core/message.c:1037: usb_hcd_flush_endpoint() ->drivers/usb/core/hcd.c:1514: usb_kill_urb(urb) in 1st run of urb_list loop ->drivers/usb/core/urb.c:561: wait_event(usb_kill_urb_queue, atomic_read(&urb->use_count) == 0);
on this last call it hangs and never returns.
https://bugzilla.novell.com/show_bug.cgi?id=370872
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED
https://bugzilla.novell.com/show_bug.cgi?id=370872
User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c9
Stefan Seyfried seife@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |zoz@novell.com, gregkh@novell.com Priority|P5 - None |P3 - Medium Version|Alpha 2plus |Final
--- Comment #9 from Stefan Seyfried seife@novell.com 2008-07-18 03:33:01 MDT --- So what is needed to fix this issue? This bug is still present in 11.1 and it breaks our default suspend setup.
(adding Greg since he also knows a tiny bit of USB in case Oliver is on vacation)
https://bugzilla.novell.com/show_bug.cgi?id=370872
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Severity|Normal |Major
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c10
--- Comment #10 from Oliver Neukum oneukum@novell.com 2008-07-23 10:48:52 MDT --- usb_hcd_check_unlink_urb() returns -EBUSY which means that urb->unlink != 0 && urb->use_count != 0
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c11
--- Comment #11 from Oliver Neukum oneukum@novell.com 2008-07-24 05:18:23 MDT --- The same URB is processed in unlink1() and usb_kill_urb(), looks like an interrupt is lost.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c12
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |seife@novell.com
--- Comment #12 from Oliver Neukum oneukum@novell.com 2008-07-24 13:27:30 MDT --- acpi_pci_irq_disable() is called for the device before the devices on the bus are disconnected. usb_kill_urb() calls down into start_ed_unlink() which requests an interrupt that is not delivered, so the use_count never goes to zero.
Is the call to acpi_pci_irq_disable() specific to CardBus? How can it be avoided?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c14
Stefan Seyfried seife@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Info Provider|seife@novell.com |gregkh@novell.com
--- Comment #14 from Stefan Seyfried seife@novell.com 2008-07-24 15:19:15 MDT --- (In reply to comment #12 from Oliver Neukum)
acpi_pci_irq_disable() is called for the device before the devices on the bus are disconnected. usb_kill_urb() calls down into start_ed_unlink() which requests an interrupt that is not delivered, so the use_count never goes to zero.
That indeed seems wrong, even for me ;-)
Is the call to acpi_pci_irq_disable() specific to CardBus? How can it be avoided?
I have no idea. Let's ask someone who knows at least PCI and USB...
https://bugzilla.novell.com/show_bug.cgi?id=370872
User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c15
Thomas Renninger trenn@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |rjwysocki@sisk.pl, pavel@novell.com
--- Comment #15 from Thomas Renninger trenn@novell.com 2008-08-08 05:27:48 MDT --- Rafael should now pci+suspend parts best. As this one already got debugged to the ground, it's worth adding him...
https://bugzilla.novell.com/show_bug.cgi?id=370872
User rjwysocki@sisk.pl added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c16
--- Comment #16 from Rafael Wysocki rjwysocki@sisk.pl 2008-08-09 14:30:20 MDT --- Well, in fact I'm not very familiar with that particular aspect of PCI+ACPI, but I'll do my best to have a look at this shortly.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User rjwysocki@sisk.pl added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c17
--- Comment #17 from Rafael Wysocki rjwysocki@sisk.pl 2008-08-10 13:06:57 MDT --- Hm, which kernel should I be looking at?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c18
--- Comment #18 from Oliver Neukum oneukum@novell.com 2008-08-12 00:34:01 MDT --- Please look at SL110_BRANCH
https://bugzilla.novell.com/show_bug.cgi?id=370872
User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c19
Thomas Renninger trenn@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gregkh@novell.com |
--- Comment #19 from Thomas Renninger trenn@novell.com 2008-08-12 05:26:16 MDT --- I'll give it a try. Don't waste time trying out SUSE kernels. It would be great if you could back me up if I get stuck at some place.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c20
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |trenn@novell.com
--- Comment #20 from Oliver Neukum oneukum@novell.com 2008-08-18 01:07:42 MDT --- Any ideas? Where is this call in the sequence? Can we try to simply remove it?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User novell@mamy.to added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c21
Marek Wodzinski novell@mamy.to changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |novell@mamy.to
--- Comment #21 from Marek Wodzinski novell@mamy.to 2008-08-23 15:59:18 MDT --- I'm also affected by this bug (Option GX0201 3G card, stock 2.6.24.5-smp Slackware kernel). I made workaround to this bug (thanks for comments!) simply removing ohci_hcd module before making pccardctl eject. It works for me. Thanks!
https://bugzilla.novell.com/show_bug.cgi?id=370872
User rjw@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c22
Rafael Wysocki rjw@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |rjw@novell.com
--- Comment #22 from Rafael Wysocki rjw@novell.com 2008-08-24 15:48:20 MDT --- (In reply to comment #20 from Oliver Neukum)
Any ideas? Where is this call in the sequence? Can we try to simply remove it?
Is this acpi_pci_irq_disable() called from pci_disable_device(), through pcibios_disable_device() and the pcibios_disable_irq() pointer? That's the only way it can be called in the mainline kernel AFAICS.
However, in the mainline kernel on x86 acpi_pci_irq_disable() doesn't seem to do anything to actually disable the IRQ.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c23
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|trenn@novell.com |
--- Comment #23 from Oliver Neukum oneukum@novell.com 2008-08-25 01:56:08 MDT --- Thanks, we have to rule out something suse specific.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c24
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |novell@mamy.to
--- Comment #24 from Oliver Neukum oneukum@novell.com 2008-08-25 01:57:38 MDT --- @Marek,
unfortunately this would also affect ohci controllers on the motherboard. Can you test whether the deadlock affects you if you run kernel-vanilla?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c25
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Info Provider|novell@mamy.to |astarikovskiy@novell.com
--- Comment #25 from Oliver Neukum oneukum@novell.com 2008-09-10 10:07:30 MDT --- Alexey,
any idea where this call could come from?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User novell@mamy.to added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c26
--- Comment #26 from Marek Wodzinski novell@mamy.to 2008-10-08 00:04:17 MDT --- @Olivier,
Slackware is using vanilla kernel by default. I'm lucky because none of usb interfaces on my laptop are using ohci module :-) AFAIK, my colleagues using the same card on Debian are also affected (they made similar workaround but idea is the same: remove modules used by card before ejecting).
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c27
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Info Provider|astarikovskiy@novell.com |novell@mamy.to
--- Comment #27 from Oliver Neukum oneukum@novell.com 2008-10-09 05:26:11 MDT --- Do you have evidence this is limited to ohci?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c28
--- Comment #28 from Oliver Neukum oneukum@novell.com 2008-10-09 05:43:07 MDT --- pci_acpi_init() sets pcibios_disable_irq = acpi_pci_irq_disable
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c29
--- Comment #29 from Oliver Neukum oneukum@novell.com 2008-10-09 06:06:05 MDT --- called from pci_disable_device()
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c30
--- Comment #30 from Oliver Neukum oneukum@novell.com 2008-10-09 10:34:46 MDT --- Created an attachment (id=244687) --> (https://bugzilla.novell.com/attachment.cgi?id=244687) first attempt at a fix by taking an extra pci reference in ohci
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c33
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|novell@mamy.to |
--- Comment #33 from Oliver Neukum oneukum@novell.com 2008-10-15 09:59:59 MDT --- As far as I can tell usb_hcd_pci_remove() does the right thing calling usb_remove_hcd() before pci_disable_device()
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c34
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |seife@novell.com
--- Comment #34 from Oliver Neukum oneukum@novell.com 2008-10-15 14:23:54 MDT --- I can no longer replicate it on the current kernel. Can you?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c35
Stefan Seyfried seife@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|seife@novell.com |
--- Comment #35 from Stefan Seyfried seife@novell.com 2008-10-16 12:14:41 MDT --- Yes.
Linux stoetzler 2.6.27-19-default #1 SMP 2008-10-14 16:02:55 +0200 x86_64 x86_64 x86_64 GNU/Linux
root@stoetzler:~# rpm -q --changelog kernel-default|head -3 * Di Okt 14 2008 mmarek@suse.cz - rpm/postun.sh, rpm/post.sh: temporarily ignore errors from weak-modules2 --{add,remove}-kernel-modules until
I did the following: - put in novatel xu870 (option driver) express card with pcmcia adapter, opened /dev/ttyUSB3 (the first device on the card), kept it open. - pccardctl eject. - BANG.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c36
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |seife@novell.com
--- Comment #36 from Oliver Neukum oneukum@novell.com 2008-10-16 12:30:48 MDT --- Can you test whether it is necessary to keep the device open?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c37
Stefan Seyfried seife@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|seife@novell.com |
--- Comment #37 from Stefan Seyfried seife@novell.com 2008-10-16 12:56:31 MDT --- Ungern. :-)
Yes, I can. No, it is not necessary.
Plug in the device, look how it's blinking. "pccardctl eject". Device stops blinking. BOOM.
Famous last words:
[ 2258.904045] pccard: CardBus card inserted into slot 0 [ 2258.904592] PCI: 0000:03:00.0 reg 10 32bit mmio: [0, fff] [ 2258.904705] pci 0000:03:00.0: supports D1 [ 2258.904709] pci 0000:03:00.0: supports D2 [ 2258.904713] pci 0000:03:00.0: PME# supported from D0 D1 D2 D3hot [ 2258.904724] pci 0000:03:00.0: PME# disabled [ 2258.905140] PCI: 0000:03:00.1 reg 10 32bit mmio: [0, fff] [ 2258.905247] pci 0000:03:00.1: supports D1 [ 2258.905251] pci 0000:03:00.1: supports D2 [ 2258.905255] pci 0000:03:00.1: PME# supported from D0 D1 D2 D3hot [ 2258.905266] pci 0000:03:00.1: PME# disabled [ 2258.905838] ohci_hcd 0000:03:00.0: enabling device (0000 -> 0002) [ 2258.905861] vendor=8086 device=2448 [ 2258.905870] ohci_hcd 0000:03:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18 [ 2258.907996] ohci_hcd 0000:03:00.0: setting latency timer to 64 [ 2258.908048] ohci_hcd 0000:03:00.0: OHCI Host Controller [ 2258.908436] ohci_hcd 0000:03:00.0: new USB bus registered, assigned bus number 8 [ 2258.908596] ohci_hcd 0000:03:00.0: irq 18, io mem 0x84000000 [ 2258.994906] usb usb8: configuration #1 chosen from 1 choice [ 2258.995196] hub 8-0:1.0: USB hub found [ 2258.995296] hub 8-0:1.0: 3 ports detected [ 2259.096311] usb usb8: New USB device found, idVendor=1d6b, idProduct=0001 [ 2259.096320] usb usb8: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 2259.096324] usb usb8: Product: OHCI Host Controller [ 2259.096326] usb usb8: Manufacturer: Linux 2.6.27-19-default ohci_hcd [ 2259.096329] usb usb8: SerialNumber: 0000:03:00.0 [ 2259.096767] ohci_hcd 0000:03:00.1: enabling device (0000 -> 0002) [ 2259.096783] vendor=8086 device=2448 [ 2259.096789] ohci_hcd 0000:03:00.1: PCI INT B -> GSI 18 (level, low) -> IRQ 18 [ 2259.096935] ohci_hcd 0000:03:00.1: setting latency timer to 64 [ 2259.096946] ohci_hcd 0000:03:00.1: OHCI Host Controller [ 2259.097086] ohci_hcd 0000:03:00.1: new USB bus registered, assigned bus number 9 [ 2259.097183] ohci_hcd 0000:03:00.1: irq 18, io mem 0x84001000 [ 2259.182159] usb usb9: configuration #1 chosen from 1 choice [ 2259.182441] hub 9-0:1.0: USB hub found [ 2259.182533] hub 9-0:1.0: 2 ports detected [ 2259.284815] usb usb9: New USB device found, idVendor=1d6b, idProduct=0001 [ 2259.284825] usb usb9: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 2259.284828] usb usb9: Product: OHCI Host Controller [ 2259.284831] usb usb9: Manufacturer: Linux 2.6.27-19-default ohci_hcd [ 2259.284834] usb usb9: SerialNumber: 0000:03:00.1 [ 2264.637062] usb 8-1: new full speed USB device using ohci_hcd and address 2 [ 2264.870625] usb 8-1: configuration #1 chosen from 1 choice [ 2264.884344] usb 8-1: New USB device found, idVendor=1410, idProduct=1430 [ 2264.884354] usb 8-1: New USB device strings: Mfr=1, Product=2, SerialNumber=4 [ 2264.884358] usb 8-1: Product: Novatel Wireless HSDPA Modem [ 2264.884360] usb 8-1: Manufacturer: Novatel Wireless [ 2264.884364] usb 8-1: SerialNumber: 011057002034551 [ 2264.979158] usbserial: USB Serial support registered for GSM modem (1-port) [ 2264.981488] option 8-1:1.0: GSM modem (1-port) converter detected [ 2264.982122] usb 8-1: GSM modem (1-port) converter now attached to ttyUSB3 [ 2264.982346] option 8-1:1.1: GSM modem (1-port) converter detected [ 2264.982586] usb 8-1: GSM modem (1-port) converter now attached to ttyUSB4 [ 2264.982787] usbcore: registered new interface driver option [ 2264.982795] option: USB Driver for GSM modems: v0.7.2 [ 2316.928573] pccard: card ejected from slot 0 [ 2316.960231] ohci_hcd 0000:03:00.0: remove, state 1 [ 2316.960247] usb usb8: USB disconnect, address 1 [ 2316.960250] usb 8-1: USB disconnect, address 2 root@stoetzler:~# reboot
https://bugzilla.novell.com/show_bug.cgi?id=370872
User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c38
--- Comment #38 from Stefan Seyfried seife@novell.com 2008-10-16 12:58:08 MDT --- Pulling the card out gave additional
[ 2458.757979] ohci_hcd 0000:03:00.0: HC died; cleaning up [ 2458.757994] ohci_hcd 0000:03:00.1: HC died; cleaning up
But did not release the pcmciactl.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c39
--- Comment #39 from Oliver Neukum oneukum@novell.com 2008-10-17 04:21:21 MDT --- Yes, by then it is too late. In that hang we really need an interrupt.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c40
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |seife@novell.com
--- Comment #40 from Oliver Neukum oneukum@novell.com 2008-10-17 05:20:33 MDT --- I still cannot replicate it. Very well, please run the debug patch in #41 and provide dmesg with logging level at 9
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c41
--- Comment #41 from Oliver Neukum oneukum@novell.com 2008-10-17 05:21:31 MDT --- Created an attachment (id=246244) --> (https://bugzilla.novell.com/attachment.cgi?id=246244) debug patch for pci_disable_device()
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gregkh@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c44
Greg Kroah-Hartman gregkh@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gregkh@novell.com |
--- Comment #44 from Greg Kroah-Hartman gregkh@novell.com 2008-10-17 09:38:05 MDT --- I'm not in Nürnberg :)
As you can't duplicate this, and I haven't seen any other reports of this, I'd just wait until Stefan gets back from vacation to continue with this...
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c45
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |novell@mamy.to
--- Comment #45 from Oliver Neukum oneukum@novell.com 2008-10-17 09:57:27 MDT --- @Marek,
can you please run the debug patch mentioned in comments #40 and #41 ?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c48
--- Comment #48 from Gerald Pfeifer gp@novell.com 2008-11-09 18:50:13 MST --- Created an attachment (id=250930) --> (https://bugzilla.novell.com/attachment.cgi?id=250930) /var/log/pm-suspend.log on T41p with nozomi based card
I'm also seeing this on a T41p, with Beta 4+, and a nozomi based card.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c49
--- Comment #49 from Stefan Seyfried seife@novell.com 2008-11-10 01:51:19 MST --- Gerald, can you do a sysrq-t when the machine hangs and attach /var/log/messages with it? To be honest, I'd suspect nozomi to be a different bug...
https://bugzilla.novell.com/show_bug.cgi?id=370872
User pavel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c50
--- Comment #50 from Pavel Machek pavel@novell.com 2008-11-11 03:01:01 MST --- the 250930 attachment comes empty for me?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c51
--- Comment #51 from Stefan Seyfried seife@novell.com 2008-11-11 03:58:47 MST --- (In reply to comment #50 from Pavel Machek)
the 250930 attachment comes empty for me?
It is not, but it is only the pm-suspend log and it ends with
===== Sun Nov 9 21:53:17 CET 2008: running hook: /usr/lib/pm-utils/sleep.d/45pcmcia ===== ejecting PCMCIA cards...
..so nothing new.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c52
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|novell@mamy.to |
--- Comment #52 from Gerald Pfeifer gp@novell.com 2008-11-14 09:47:07 MST --- Created an attachment (id=252346) --> (https://bugzilla.novell.com/attachment.cgi?id=252346) /var/log/messages taken together with seife
(In reply to comment #49 from Stefan Seyfried)
Gerald, can you do a sysrq-t when the machine hangs and attach /var/log/messages with it? To be honest, I'd suspect nozomi to be a different bug...
Here we go. If your analysis shows this to be a different issue, I can surely open a separate bug.
https://bugzilla.novell.com/show_bug.cgi?id=370872
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #252346|application/octet-stream |text/plain mime type| |
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c53
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #53 from Oliver Neukum oneukum@novell.com 2008-12-08 02:51:57 MST --- Did you apply the diagnostic patch from comment #41 ? It looks very much like the original issue. I am pretty sure I know where and why it hangs, but I don't know where the interrupts are switched off too early. The patch in #41 should show. Sysrq-t at the hang comes too late. I need the traces from the patch in #41.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c54
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #54 from Gerald Pfeifer gp@novell.com 2008-12-20 21:04:35 MST --- I'm travelling right now, without access to the hardware in question, but if someone can provide a test kernel for me, I'll do my best to give it a spin in about a week's time.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c55
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #55 from Oliver Neukum oneukum@novell.com 2008-12-24 05:09:34 MST --- 32 or 64 bit?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c56
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #56 from Gerald Pfeifer gp@novell.com 2008-12-29 05:36:24 MST --- 32-bit, this is still a lowly T41p. ;-) Thanks!
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c58
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #58 from Oliver Neukum oneukum@novell.com 2008-12-29 09:56:52 MST --- Please test the kernel to be found at http://beta.suse.com/private/oneukum/370872
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c59
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #59 from Gerald Pfeifer gp@novell.com 2009-01-08 04:14:18 MST --- I just realized that I had done testing and failed to update Bugzilla accordingly. Sorry about that! :-(
Oliver, sadly the test kernel you provided does not run on my hardware where I see this issue. The T41p (cf. comment #56) has an older Pentium M without PAE support, and the test kernel required PAE support.
If you could get me a new test kernel today or tomorrow, I pledge to give it another try within a few hours.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c60
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #60 from Oliver Neukum oneukum@novell.com 2009-01-08 05:33:33 MST --- OK, can you name the exact flavor you can run?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c62
--- Comment #62 from Gerald Pfeifer gp@novell.com 2009-01-08 07:27:38 MST --- The regular -default kernel works fine, both the openSUSE 11.0 (2.6.25) and openSUSE 11.1 (2.6.27) ones.
2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 i686 i686 i386 GNU/Linux
I'm going to download kalman-oneukum-52/11.0-i386/kernel-default-2.6.25.20-0.2.i586.rpm now...
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c63
--- Comment #63 from Oliver Neukum oneukum@novell.com 2009-01-08 07:49:28 MST --- Very good. I thought you needed some superexotic flavor.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c64
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #64 from Gerald Pfeifer gp@novell.com 2009-01-08 18:23:09 MST --- Created an attachment (id=264031) --> (https://bugzilla.novell.com/attachment.cgi?id=264031) Output of dmesg from kernel from comment #62
Got it, it seems! Does this help?
This is 2.6.25.20-0.2-default (from comment #62) running on 11.1 (instead of 11.0) with all patches. It took me two attempts, the first s2r was in fact successful.
https://bugzilla.novell.com/show_bug.cgi?id=370872
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #264031|application/octet-stream |text/plain mime type| |
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c65
--- Comment #65 from Oliver Neukum oneukum@novell.com 2009-01-09 01:34:01 MST --- Relevant part of the traces from #64:
Pid: 26726, comm: s2ram Tainted: G N 2.6.25.20-0.2-default #1 [<c01071d9>] dump_trace+0x63/0x227 [<c0107c8a>] show_trace+0x15/0x29 [<c02e135e>] dump_stack+0x5b/0x65 [<c01ea1ed>] pci_disable_device+0x3e/0x8c [<f908d421>] usb_hcd_pci_suspend+0x9b/0x14d [usbcore] [<c01eb5d2>] pci_device_suspend+0x1b/0x4d [<c024f5f1>] device_suspend+0xdf/0x1a4 [<c0145d5e>] suspend_devices_and_enter+0x3d/0x101 [<c0145f31>] enter_state+0xca/0x117 [<c014600c>] state_store+0x8e/0xa2 [<c01de285>] kobj_attr_store+0x1a/0x22 [<c01b22ab>] sysfs_write_file+0xb0/0xdb [<c017740e>] vfs_write+0x8c/0x136 [<c0177551>] sys_write+0x3b/0x60 [<c01059e4>] sysenter_past_esp+0x6d/0xa9 [<ffffe430>] 0xffffe430 ======================= ACPI: PCI interrupt for device 0000:00:1d.7 disabled Pid: 26726, comm: s2ram Tainted: G N 2.6.25.20-0.2-default #1 [<c01071d9>] dump_trace+0x63/0x227 [<c0107c8a>] show_trace+0x15/0x29 [<c02e135e>] dump_stack+0x5b/0x65 [<c01ea1ed>] pci_disable_device+0x3e/0x8c [<f908d421>] usb_hcd_pci_suspend+0x9b/0x14d [usbcore] [<c01eb5d2>] pci_device_suspend+0x1b/0x4d [<c024f5f1>] device_suspend+0xdf/0x1a4 [<c0145d5e>] suspend_devices_and_enter+0x3d/0x101 [<c0145f31>] enter_state+0xca/0x117 [<c014600c>] state_store+0x8e/0xa2 [<c01de285>] kobj_attr_store+0x1a/0x22 [<c01b22ab>] sysfs_write_file+0xb0/0xdb [<c017740e>] vfs_write+0x8c/0x136 [<c0177551>] sys_write+0x3b/0x60 [<c01059e4>] sysenter_past_esp+0x6d/0xa9 [<ffffe430>] 0xffffe430 ======================= ACPI: PCI interrupt for device 0000:00:1d.2 disabled Pid: 26726, comm: s2ram Tainted: G N 2.6.25.20-0.2-default #1 [<c01071d9>] dump_trace+0x63/0x227 [<c0107c8a>] show_trace+0x15/0x29 [<c02e135e>] dump_stack+0x5b/0x65 [<c01ea1ed>] pci_disable_device+0x3e/0x8c [<f908d421>] usb_hcd_pci_suspend+0x9b/0x14d [usbcore] [<c01eb5d2>] pci_device_suspend+0x1b/0x4d [<c024f5f1>] device_suspend+0xdf/0x1a4 [<c0145d5e>] suspend_devices_and_enter+0x3d/0x101 [<c0145f31>] enter_state+0xca/0x117 [<c014600c>] state_store+0x8e/0xa2 [<c01de285>] kobj_attr_store+0x1a/0x22 [<c01b22ab>] sysfs_write_file+0xb0/0xdb [<c017740e>] vfs_write+0x8c/0x136 [<c0177551>] sys_write+0x3b/0x60 [<c01059e4>] sysenter_past_esp+0x6d/0xa9 [<ffffe430>] 0xffffe430 ======================= ACPI: PCI interrupt for device 0000:00:1d.1 disabled Pid: 26726, comm: s2ram Tainted: G N 2.6.25.20-0.2-default #1 [<c01071d9>] dump_trace+0x63/0x227 [<c0107c8a>] show_trace+0x15/0x29 [<c02e135e>] dump_stack+0x5b/0x65 [<c01ea1ed>] pci_disable_device+0x3e/0x8c [<f908d421>] usb_hcd_pci_suspend+0x9b/0x14d [usbcore] [<c01eb5d2>] pci_device_suspend+0x1b/0x4d [<c024f5f1>] device_suspend+0xdf/0x1a4 [<c0145d5e>] suspend_devices_and_enter+0x3d/0x101 [<c0145f31>] enter_state+0xca/0x117 [<c014600c>] state_store+0x8e/0xa2 [<c01de285>] kobj_attr_store+0x1a/0x22 [<c01b22ab>] sysfs_write_file+0xb0/0xdb [<c017740e>] vfs_write+0x8c/0x136 [<c0177551>] sys_write+0x3b/0x60 [<c01059e4>] sysenter_past_esp+0x6d/0xa9 [<ffffe430>] 0xffffe430 ======================= ACPI: PCI interrupt for device 0000:00:1d.0 disabled ACPI: Preparing to enter system sleep state S3
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c66
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #66 from Oliver Neukum oneukum@novell.com 2009-01-09 01:35:55 MST --- Was this taken from the succesful or the hung attempt? Are you able to successfully suspend your laptop on the standard kernel, too?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c67
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #67 from Gerald Pfeifer gp@novell.com 2009-01-09 03:33:10 MST --- (In reply to comment #66 from Oliver Neukum)
Was this taken from the succesful or the hung attempt?
This is the output of dmesg after 1. fresh boot 2. successful s2r and resume 3. failing s2r
Are you able to successfully suspend your laptop on the standard kernel, too?
s2r has been working on this machine without problems; often I do not reboot for days but s2r over night and when relocating.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c68
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #68 from Oliver Neukum oneukum@novell.com 2009-01-15 06:02:32 MST --- Hm. It should fail always or never. Can you check whether it always hangs the second time, or whether this is random?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c69
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #69 from Gerald Pfeifer gp@novell.com 2009-01-19 23:49:31 MST --- Another extensive round of testing, with the openSUSE 11.1 kernel (to focus on something more recent) went as follows:
. Hard boot . logger "after first hard boot" . insert UMTS card . start umtsmon, wait until "Cingular" network shows, but don't connect . s2r, works just fine . start umtsmon, logger "after first...", wait until "Cingular" network shows . s2r, works just fine . logger "after second..." . start umtsmon, wait until "Cingular" network shows . s2r, works just fine . logger "after third..." . establish connection via nm-applet . s2r FAILS . s2r FAILS . logger "after two unsuccessfull s2r attempts"
. Hard boot . insert UMTS card . logger "after second hard boot" . s2r FAILS . logger "after two other unsuccessfull s2r attempts"
That would support your "always happens" theory, though it only seems to occur with the network connection actually established, not just the card connected.
Note: if you have a fix, this should go into SLE 11, too, please.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c70
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #70 from Oliver Neukum oneukum@novell.com 2009-01-20 02:28:11 MST --- What exactly do you mean by "s2r FAILS"? If this bug strikes your machine locks up hard.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c71
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #71 from Gerald Pfeifer gp@novell.com 2009-01-20 06:40:15 MST --- Created an attachment (id=266205) --> (https://bugzilla.novell.com/attachment.cgi?id=266205) /var/log/message for my 2009-01-19 testruns
I am afraid this may mean "my" bug is a different one from the one you are looking at. :-(
"s2r FAILS" in my case means that /var/log/pm-suspend.log stops at the following, and the system goes through screen lock:
===== 2009-01-20 07:57:43.329227159 running hook: /usr/lib/pm-utils/sleep.d/45pcmcia ==== ejecting PCMCIA cards...
Let me attach /var/log/messages from my tests yesterday, note the entries created by logger as described in my activity log.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c72
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #72 from Oliver Neukum oneukum@novell.com 2009-01-20 06:50:31 MST --- Can you get a sysrq-T trace when the bug strikes? The hang is in state D. It is possible that a system remains functional but with a hanging task. It doesn't explain why the bug strikes only sometimes, but it is unlikely that two so similar bugs exist.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c73
--- Comment #73 from Gerald Pfeifer gp@novell.com 2009-01-22 19:21:16 MST --- Created an attachment (id=267094) --> (https://bugzilla.novell.com/attachment.cgi?id=267094) /var/log/messages with sysrq-T
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c74
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #74 from Gerald Pfeifer gp@novell.com 2009-01-22 19:23:25 MST --- (In reply to comment #72)
Can you get a sysrq-T trace when the bug strikes? The hang is in state D. It is possible that a system remains functional but with a hanging task. It doesn't explain why the bug strikes only sometimes, but it is unlikely that two so similar bugs exist.
In my recent testing (cf. comment #69) the issue happens consistently, alas not in the form of the machine locking up.
As requested, I have attached sysrq-T output for you.
Since you seem to have found an actual bug, how about providing a kernel with a fix for that (and including that fix in the openSUSE 11.1 and, more importantly, SLE 11 kernel trees)? I'll be happy to test that, even though it means putting down my main production machine...
https://bugzilla.novell.com/show_bug.cgi?id=370872
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #267094|application/octet-stream |text/plain mime type| |
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c75
--- Comment #75 from Oliver Neukum oneukum@novell.com 2009-01-24 02:31:15 MST --- I've found out what the bug does, not where it is.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c76
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #76 from Oliver Neukum oneukum@novell.com 2009-01-24 02:36:57 MST --- Can you replicate it with the vanilla kernel?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c77
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #77 from Gerald Pfeifer gp@novell.com 2009-01-25 18:19:14 MST --- Same thing with 2.6.27.7-9-vanilla.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c78
--- Comment #78 from Oliver Neukum oneukum@novell.com 2009-01-27 06:31:00 MST --- Replicated it on 2.6.29-rc2. I couldn't replicate it due to a stupid error.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c79
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #79 from Oliver Neukum oneukum@novell.com 2009-01-27 08:21:13 MST --- Do you use OHCI, UHCI and/or EHCI on your test device?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c80
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #80 from Gerald Pfeifer gp@novell.com 2009-01-27 18:37:21 MST --- Created an attachment (id=268124) --> (https://bugzilla.novell.com/attachment.cgi?id=268124) Output of 'dmesg | grep hci'
Do you use OHCI, UHCI and/or EHCI on your test device?
See attached. And I am quite happy you can reproduce this now, since my "test device" happens to be my production machine which is somewhat inconvenient to reboot/break suspend. ;-)
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c81
--- Comment #81 from Oliver Neukum oneukum@novell.com 2009-02-02 05:54:42 MST --- I understood the final missing piece. The code works perfectly for one device. If the card has two devices, the first is happily handled and the shared interrupt disabled.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c82
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jslaby@novell.com
--- Comment #82 from Oliver Neukum oneukum@novell.com 2009-02-02 06:11:11 MST --- Adding Jiri because this probably hits every PC-Card with more than one device.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User jslaby@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c83
--- Comment #83 from Jiri Slaby jslaby@novell.com 2009-02-02 08:15:04 MST --- Created an attachment (id=269272) --> (https://bugzilla.novell.com/attachment.cgi?id=269272) deadlock fixup
This one :)?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c84
--- Comment #84 from Oliver Neukum oneukum@novell.com 2009-02-02 08:35:02 MST --- I am testing, but I doubt it.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c85
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |jslaby@novell.com
--- Comment #85 from Oliver Neukum oneukum@novell.com 2009-02-02 09:19:30 MST --- Tested, but it doesn't work. The problem is that the interrupt is turned off as soon as the first device has been removed. The second device then cannot be turned off because USB needs interrupts for that. Any ideas?
https://bugzilla.novell.com/show_bug.cgi?id=370872
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #250930|text/x-log |text/plain mime type| |
https://bugzilla.novell.com/show_bug.cgi?id=370872
User jslaby@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c86
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|jslaby@novell.com |
--- Comment #86 from Jiri Slaby jslaby@novell.com 2009-02-02 09:41:15 MST --- (In reply to comment #85)
Any ideas?
I'm confused. It seems you are able to reproduce another bug than described here.
They are unable to suspend because: 1) nozomi waits infinitely for freeing the tty in its devexit function 2) broken eject locking (the patch attached, it's ugly and doesn't solve the issue properly though)
Do you have any logs regarding your issue?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c87
--- Comment #87 from Oliver Neukum oneukum@novell.com 2009-02-02 14:36:30 MST --- Created an attachment (id=269401) --> (https://bugzilla.novell.com/attachment.cgi?id=269401) trace of ejection in 2.6.25 with a bit of added debug output
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c88
--- Comment #88 from Oliver Neukum oneukum@novell.com 2009-02-02 14:39:02 MST --- Created an attachment (id=269404) --> (https://bugzilla.novell.com/attachment.cgi?id=269404) trace of ejection in 2.6.29-rc2
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c89
--- Comment #89 from Oliver Neukum oneukum@novell.com 2009-02-02 14:57:49 MST --- Indeed, we might have two bugs. Nevertheless, as this was assigned to me, I looked for a USB bug, which indeed exists. As far as I can tell, the bug will strike under two conditions
1) more than one devices on the card 2) drivers need interrupts for remove()
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c90
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #90 from Oliver Neukum oneukum@novell.com 2009-02-03 06:45:19 MST --- Could you provide "lspci -v" with your card plugged in?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c91
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #91 from Gerald Pfeifer gp@novell.com 2009-02-03 14:58:25 MST --- Created an attachment (id=269794) --> (https://bugzilla.novell.com/attachment.cgi?id=269794) Output of "lspci -v"
The diff for "lspci -v" between the system with/without the card is:
< 03:00.0 Network controller: Option N.V. Qualcomm MSM6275 UMTS chip < Flags: medium devsel, IRQ 11 < Memory at c4000000 (32-bit, non-prefetchable) [size=2K] < Kernel driver in use: nozomi < Kernel modules: nozomi
I am also attaching the full output.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c92
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |jslaby@novell.com
--- Comment #92 from Oliver Neukum oneukum@novell.com 2009-02-03 15:09:35 MST --- Very well. Can we conclude from this that we have two bugs? Is there a bug entry or a test kernel corresponding to the patch of comment#83 ?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c94
--- Comment #94 from Oliver Neukum oneukum@novell.com 2009-02-10 06:18:05 MST --- Gerald,
could you try the kernel to be found at: http://beta.suse.com/private/oneukum/370872/
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c95
--- Comment #95 from Gerald Pfeifer gp@novell.com 2009-02-12 16:48:27 MST --- Sorry, I cannot test this kernel since my hardware doesn't support PAE (cf. comment #59).
https://bugzilla.novell.com/show_bug.cgi?id=370872
User jslaby@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c99
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|jslaby@novell.com |
--- Comment #99 from Jiri Slaby jslaby@novell.com 2009-02-13 14:15:24 MST --- Oliver, could you attach .config from your kernel or ohci-hcd.ko?
https://bugzilla.novell.com/show_bug.cgi?id=370872
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- URL| |http://marc.info/?l=linux-u | |sb&m=123332349825739&w=2
https://bugzilla.novell.com/show_bug.cgi?id=370872
User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c105
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |jslaby@novell.com
--- Comment #105 from Oliver Neukum oneukum@novell.com 2009-02-16 00:30:40 MST --- Jiri,
do you have a clean version of the patch of comment#83? It should definitely be merged.
https://bugzilla.novell.com/show_bug.cgi?id=370872
User jslaby@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c106
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Info Provider|jslaby@novell.com |gp@novell.com
--- Comment #106 from Jiri Slaby jslaby@novell.com 2009-02-17 10:39:11 MST --- (In reply to comment #105)
do you have a clean version of the patch of comment#83?
Not quite yet. Gerald, could you try kernel from http://labs.suse.cz/jslaby/bug-370872/ and suspend. When it fails, press sysrq-d and attach dmesg.
https://bugzilla.novell.com/show_bug.cgi?id=370872
Oliver Neukum oneukum@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|oneukum@novell.com |jslaby@novell.com
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c107
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #107 from Gerald Pfeifer gp@novell.com 2009-02-26 19:02:01 MST --- Created an attachment (id=275866) --> (https://bugzilla.novell.com/attachment.cgi?id=275866) Output of dmesg after SysRq-d with 2.6.27.17-20090217_954248e4-default
Note that with this kernel already the first suspend failed.
https://bugzilla.novell.com/show_bug.cgi?id=370872
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Depends on| |480239
https://bugzilla.novell.com/show_bug.cgi?id=370872
User jslaby@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c108
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #108 from Jiri Slaby jslaby@novell.com 2009-03-03 15:35:52 MST --- (In reply to comment #107)
Output of dmesg after SysRq-d with 2.6.27.17-20090217_954248e4-default
Hmm, It's not much helpful since the loopdev lockdep warning turned lockdep off. Could you try the new kernel from the labs and retry with sysrq-d?
https://bugzilla.novell.com/show_bug.cgi?id=370872
Bug 370872 depends on bug 480239, which changed state.
Bug 480239 Summary: losetup: circular locking dependency https://bugzilla.novell.com/show_bug.cgi?id=480239
What |Old Value |New Value ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Status|NEEDINFO |RESOLVED Resolution| |WONTFIX
https://bugzilla.novell.com/show_bug.cgi?id=370872
User gp@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c109
Gerald Pfeifer gp@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com |
--- Comment #109 from Gerald Pfeifer gp@novell.com 2009-03-28 13:50:49 MDT --- Sorry, Jiri, I had missed this and it is not clear how exactly I can help now. Which kernel would you like me to try, and what would you like me to do then? (I assume reproduce the situation where the system fails to suspend and then issue sysrq-d)?
https://bugzilla.novell.com/show_bug.cgi?id=370872
User jslaby@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c110
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |gp@novell.com
--- Comment #110 from Jiri Slaby jslaby@novell.com 2009-03-28 14:22:51 MDT --- (In reply to comment #109)
Which kernel like me to do then?
The one from: http://labs.suse.cz/jslaby/bug-370872/
(I assume reproduce the situation where the system fails to suspend and then issue sysrq-d)?
Exactly. Thanks.
https://bugzilla.novell.com/show_bug.cgi?id=370872
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Depends on| |490035
https://bugzilla.novell.com/show_bug.cgi?id=370872
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Depends on| |490036
https://bugzilla.novell.com/show_bug.cgi?id=370872
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Depends on| |490037
https://bugzilla.novell.com/show_bug.cgi?id=370872
User jslaby@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=370872#c111
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|gp@novell.com | Depends on|490037 | Summary|"pccardctl eject" hangs in |pcmcia cards prevent |state D |suspending
--- Comment #111 from Jiri Slaby jslaby@novell.com 2009-03-29 07:33:20 MDT --- Consider this further to be only a virtual bug report. Do not add any comments here, because it grew into an unmaintanable mess.
There are currently 4 issues related to this bug report: 1) bug 480239 -- losetup: circular locking dependency (resolved invalid -- a false positive) 2) bug 490035 -- PCMCIA USB adapter causes suspend to fail 3) bug 490036 -- nozomi prevents suspending 4) bug 490037 -- "pccardctl eject" hangs in state D (original issue from here)
https://bugzilla.novell.com/show_bug.cgi?id=370872
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Depends on| |490037
https://bugzilla.novell.com/show_bug.cgi?id=370872
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Priority|P3 - Medium |P4 - Low Severity|Major |Minor