[Bug 211511] New: suspend to RAM hangs on resume
https://bugzilla.novell.com/show_bug.cgi?id=211511 Summary: suspend to RAM hangs on resume Product: SUSE Linux 10.1 Version: Final Platform: i686 OS/Version: SuSE Linux 10.1 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: jimc@math.ucla.edu QAContact: qa@suse.de With numerous previous kernel patch versions I could suspend to RAM using the procedure below, but after the most recent patch, every time I try to suspend to RAM, the machine appears to suspend successfully (the power light blinks), but upon resume it hangs solid, not even any response to the magic sysrq key. Versions: fglrx-8.28.18 (no downloadable binary; built on this machine on 2006-06-17) kernel-default-2.6.16.21-0.25 powersave-0.12.20-0.3 (running but bypassed, see below) dbus-1-0.60-30 (running) hal-0.5.6-33.11 (running) Log files -- totally useless. /var/log/suspend2ram.log -- nothing written /var/log/standby.log -- nothing written /var/lib/suspend2{disk,ram}-state.resume -- nothing written /var/log/debug (all syslog msgs) -- nothing before the reboot On-screen messages -- None; never gets to turning the lamp on. Hardware: Dell Inspiron 5000d with ATI Radeon Mobility M300 (M22 5460) How I get into suspend state: Unmount crypto filesystem Start xlock, sleep 2 secs for it to start Write "shutdown" into /sys/power/disk (in 2.6.12-rc3 it reverted randomly to "reboot") Write "mem" into /sys/power/state (in some previous version, powersaved put the CPU into a zombie state, and "shutdown -z now" always rebooted rather than suspending.) Discussion: "Obviously" fglrx is involved, and I certainly will download a new version when I have a chance. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 lmb@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |jimc@math.ucla.edu ------- Comment #1 from lmb@novell.com 2006-10-10 17:47 MST ------- In particular, please let us know if the problem persists if the proprietary, non-supported flgrx is not used, please. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #2 from jimc@math.ucla.edu 2006-10-11 21:35 MST ------- Created an attachment (id=101278) --> (https://bugzilla.novell.com/attachment.cgi?id=101278&action=view) modules loaded in runlevel 1 before failure -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 jimc@math.ucla.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|jimc@math.ucla.edu | ------- Comment #3 from jimc@math.ucla.edu 2006-10-11 21:38 MST ------- Good (?) news: suspend-to-RAM fails equally in runlevel 1 with no ati_agp kernel module nor fglrx involvement. I suspended the machine by manually echoing as indicated above to /sys/power/{disk,state}. Before suspend it wrote "stopping tasks"; "ipw2200 eth1: entering suspend". It seemed to suspend properly, then hung solid on resume (press power button). This was done twice, once in runlevel 1 and once in runlevel 3. I've attached /proc/modules captured in runlevel 1. It simplifies matters that fglrx is not involved, but the finger of suspicion shifts to ipw2200, another notorious can of worms. The Dell Inspiron 5000d has the Intel PRO/Wireless 2200BG chipset on a mini-PCI card, which becomes eth1. The wired ethernet (not connected in these tests) is Broadcom B440x with the b44 driver, on eth0. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 gregkh@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel- |pavel@novell.com |maintainers@forge.provo.nove| |ll.com | Status|ASSIGNED |NEW -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 pavel@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #4 from pavel@novell.com 2006-10-19 01:47 MST ------- Try to unload all unneccessary modules before suspend, and see which one breaks it... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 jeanjacques.moulinet@wanadoo.fr changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jeanjacques.moulinet@wanadoo.fr ------- Comment #5 from jeanjacques.moulinet@wanadoo.fr 2006-10-24 02:37 MST ------- I have the same problem on my Thinkpad R50e, with kernel 2.6.16.13. "Suspend to ram" works fine using kpowersave and prepare_suspend_to_ram & restore_after_suspend_to_ram modified scripts (to save and restore the my Intel I810 vidéo state). But if i update my kernel to 2.6.16.21, my system crasch on resume... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 pavel@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |jimc@math.ucla.edu ------- Comment #6 from pavel@novell.com 2006-10-25 01:47 MST ------- Please try with minimum modules loaded. No ethernet, no wifi, no usb, no sound. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 jimc@math.ucla.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|jimc@math.ucla.edu | ------- Comment #7 from jimc@math.ucla.edu 2006-10-27 18:01 MST ------- On the Inspiron 5000d, I unloaded everything I could, echo "mem" > /sys/power/state, and it did the same thing: seemed to suspend correctly, but when I hit the power button it froze solid, blank screen, no response to magic SysRq. Here are the remaining modules that could not be removed: intel_agp (I think the various AGP modules lock themselves in memory) agpgart usbcore (all other USB modules are gone, this one won't leave) ext3 (root filesystem) jbd (its dependency) ata_piix (root disc) libata sd_mod scsi_mod The good news is that the most complex modules are gone. The bad news is that the remainder are plenty complex. In kernel 2.6.13 I think, there was a lot of trouble with ata_piix and suspend-to-disc. I wonder how I could prevent AGP and usbcore from being loaded -- blacklist for udev somehow? I'll bet they're irrelevant but that has to be proven. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #8 from pavel@novell.com 2006-11-01 03:26 MST ------- If you do not want intel_agp not to be loaded, just boot with init=/bin/bash, or simply find intel_agp.ko and temporarily rename it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #9 from jimc@math.ucla.edu 2006-11-01 23:15 MST ------- I booted with init=/bin/bash, and had only these modules: ext3 jbd ata_piix libata sd_mod scsi_mod It still hung solid on resume. I tried 'mkinitrd -m 'piix ext3'", but it runs hwinfo and knows that ata_piix is needed. It gave me these modules: ide-core ide-disk scsi_mod sd_mod piix libata ahci ata_piix jbd ext3 I booted this initrd anyway, and the modules were loaded in the listed order. In previous kernels, piix would have attached my disc controller (it was hardwired in the kernel at that time), but not in 2.6.16; piix got nothing and ata_piix got both the disc and the DVD. So I wasn't able to prove my suspicion that ata_piix is at fault. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #10 from pavel@novell.com 2006-11-08 12:28 MST ------- Okay, which kernel version was last one where it worked? It would be nice to find out where exactly it broke. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 pavel@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |jimc@math.ucla.edu -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 jimc@math.ucla.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|jimc@math.ucla.edu | ------- Comment #11 from jimc@math.ucla.edu 2006-11-11 22:35 MST ------- Current package (broken): kernel-default-2.6.16.21-0.25 installed 2006-10-03 Previous (working): kernel-default-2.6.16.21-0.21 installed 2006-07-26 Before that (working): kernel-default-2.6.16.21-0.13 The install dates may not be completely reliable; I had trouble to interpret /var/log/YaST2/y2log. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #12 from pavel@novell.com 2006-11-13 04:34 MST ------- Can you just diff -ur between working and broken versions, and figure which part breaks it? It may also be difference in .config, or something. Also trying latest kernel (either latest vanilla, or latest from 10.2) would be nice. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 pavel@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|suspend to RAM hangs on |suspend to RAM hangs on resume on Dell, |resume |regression between 2.6.16.21-0.21 and -0.25 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 pavel@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |jimc@math.ucla.edu -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #13 from jimc@math.ucla.edu 2006-12-17 16:34 MST ------- Created an attachment (id=110039) --> (https://bugzilla.novell.com/attachment.cgi?id=110039&action=view) Kernel source diffs between patch 0.21 and 0.25 Pavel asked for diffs between the broken and non-broken kernels. I'm using showroom stock kernels (no local hacks except for proprietary modules that had never been loaded since boot when the failure was provoked) (i.e. I didn't load and then unload them; I never loaded them at all.) I can't see *anything* relevant in the diff. I'm not running XEN. I'm only using the default kernel. But the XEN hacks look like the only substantive change other than just updating the version number. Blecch. In case you're wondering, I snarfed the two update RPMs, i.e. kernel-source-2.6.16.21-0.21.i586.rpm and ernel-source-2.6.16.21-0.25.i586.rpm, and used "rpm2cpio $file | cpio -id" to extract them into a temporary directory. The attached file has diffs for both the source itself and the obj directories. I also compared patch 0.13 vs 0.25, and there were a lot more changes, but again none seemed relevant. I annotated the diffs and removed duplicate material for irrelevant architectures (powerpc, s390, etc.) and kernel variants (bigsmp, debug...) as indicated in the notes. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 jimc@math.ucla.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|jimc@math.ucla.edu | ------- Comment #14 from jimc@math.ucla.edu 2006-12-17 17:00 MST ------- Sorry for the delay but I got a new machine for my wife (see below)... I booted a CD of SuSE 10.2, locally burned from the "gold master" copy, and ran the rescue system (kernel 2.6.18). I removed all removeable modules. I thought I had saved /proc/modules, but unfortunately it was not preserved; anyway, from memory, these remained: sr_mod cdrom processor intel_agp agpgart ext3 jbd ahci ata_piix libata sd_mod scsi_mod. To tempt fate I mounted my root partition readonly and turned on my swap partition, partially simulating normal operation. I did the usual procedure, i.e. echo "mem" > /sys/power/state. It got into S3 (power light blinking), but when I pressed the power button it entered S1 (light steady), did brief disc activity, then hung solid, with a blank screen and no response to the magic SysRq key. In other words, the problem is stil there in kernel 2.6.18. The new machine is a Dell Dimension E520, with an ICH8 chipset, one SATA disc, and a RAID BIOS; if you turn it off, strange things happen like Linux can't see the disc, so I turned it back on. I have SuSE 10.1 on it, kernel patch level 0.25 same as on the laptop. I booted it single user (-s) and removed all removeable modules except USB, since it has a USB keyboard :-) The remaining modules were usbhid uhci_hcd ohci_hcd ehci_hcd usbcore ext3 jbd ahci libata sd_mod scsi_mod. I did "echo mem > /sys/power/state" and had the same experience: printed reasonable messages and entered S3, light blinking. I hit the power button, and the power light was on steady (S1) but the system was dead as a doornail. Blank screen, no noticeable disc action, no power to USB, and of course no response whatsoever to the keyboard. This is probably a separate issue, but after removing modules I did "echo disk > /sys/power/state". It printed "ata1 is slow to respond, please be patient". I was patient. Eventually it printed things too fast to read (reinitting devices incl. USB) ending with "echo: write error: No such device", having restored the original framebuffer. (Initting the massive RAID takes at least 15 secs, probably explaining the message "ata1 is slow to respond".) Then it got a write error on /dev/sda3 (the root), followed by "aborting journal on device sda3". (Writing inode atime of bash into the journal?) I pre-emptively powered it off and ran fsck. (The laptop suspends to disk with no problems -- so far.) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 pavel@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |jimc@math.ucla.edu ------- Comment #15 from pavel@novell.com 2006-12-21 06:36 MST ------- One bug per report, please. The diffs you attached do not look like they could break anything. Can you try rebuilding kernel from those sources to verify it is the source changes that make it break? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 pavel@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|jimc@math.ucla.edu | Resolution| |WORKSFORME ------- Comment #16 from pavel@novell.com 2007-01-12 16:35 MST ------- I do not see anything suspect in the diff, yet it makes the difference. Unfortunately I do not have the hardware, so I can't debug this without your help. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #17 from jimc@math.ucla.edu 2007-01-17 22:24 MST ------- I agree, everything in the diff seems completely irrelevant. I have no problem building a kernel that's supposed to be hacked, but I'm a little reluctant to try to rebuild the SuSE kernel on my own machine because the slightest error could lead to a product that is not really identical to the stock kernel. Note, all the kernels and modules have been from the various SuSE patches, completely unhacked. Do you suppose Jeff Garzik might have some ideas about problems reinitting the libata-ahci-ata_piix stack? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #18 from jimc@math.ucla.edu 2007-01-23 22:34 MST ------- Just to throw an old boot into the pot, I booted the rescue system from the SuSE 10.1 installation disc, and tried suspending (echo mem > /sys/power/state) with various sets of modules loaded. 1. I attempted to rmmod each module in order from /proc/modules. It got an error unloading sd_mod (afterward, /proc/modules showed the status as "unloading") but the messages scrolled off the screen too fast to read. The modules that didn't unload were usbcore loop cramfs sd_mod scsi_mod. On suspend (echo mem > /sys/power/state), it printed "Stopping tasks" and then hung solid. Not surprising. 2. Same except I didn't touch any of the five modules that previously would not unload. There were no errors reported. But on suspend it got: OOPS: 0000 [#1] Last sysfs file: /power/state Modules: usbcore loop cramfs sd_mod scsi_mod CPU: 0 EIP: 0060:[<e0864193>] Not tainted VLI EFLAGS: 00010246 (2.6.16.13-4-default #1) EIP is at scsi_bus_suspend +0x1f/0x30 [scsi_mod] eax: 0 ebx: e1088500 ecx: c019f4a0 edx: df74d7ac esi: dea54000 edi: 2 ebp: 2 esp: dc1f1ef0 ds: 007b es: 007b ss: 0068 Process: bash Stack: <0> dea54190 3 0 c02035b0 dea54254 dea54278 3 dea54190 2... Call trace: c02035b0 suspend_device +0xac/0xc5 c0203ba4 device_suspend +0x78/0x198 c012bb0a enter_state +0xd2/0x16c c012bc2c state_store +0x88/0x95 c012bba4 state_store +0x0/0x95 c017b40a subsys_attr_store +0x1e/0x22 c017b505 sysfs_write_file +0x98/0xbe 3. Removed all modules except usbcore ata_piix ahci libata loop cramfs sd_mod sr_mod scsi_mod cdrom. On suspend it got into S3. On resume it did brief non-obvious activities with the hard disc and the CD (which had media), then hung solid -- the behavior for this bug report. 4. Removed all modules except usbcore loop cramfs sd_mod sr_mod scsi_mod cdrom. On suspend it got the identical oops as before except esp = dab39ef0 and the first stack cell was dea5a190. Conclusion: there is something wacko in the SCSI subsystem. I don't know if we're looking at 2 independent bugs, or whether ata_piix gets trashed when it registers itself with SCSI and then exhibits various nastiness in connection with software suspend depending on what other modules are loaded. Or whether ata_piix is doing the trashing. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #19 from pavel@novell.com 2007-01-24 01:10 MST ------- Well, unload scsi subsystem, then ;-). Or just don't let it load if it can't be unloaded. Anyway, there's no point in debugging old kernels. Depending how brave you are, we probably have some latest and greatest kernel packages somewhere (10.3 work), and test with 2.6.20-rc* would be nice, too. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #20 from seife@novell.com 2007-01-24 12:39 MST ------- there were some sata fixes lately. You could try the kernel-of-the-day in ftp://ftp.suse.com/pub/projects/kernel/kotd/10.2-i386/SL102_BRANCH/kernel-default.i586.rpm, the fixed in there helped on quite some machines. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #21 from jimc@math.ucla.edu 2007-01-28 14:36 MST ------- Thanks to Stefan for the URL. I installed that version and updated its various dependencies (ended up as kernel-default-2.6.18.5-SL102_BRANCH_20070124145956), but upon removing all removeable modules and suspending to RAM it got into sleep state but as before it hung solid when resuming. Pavel, I also compiled 2.6.20-rc6 as you suggested, trying for a minimal configuration with the fewest features leaving a working system. (I can attach config if you want; probably all irrelevant.) The modules I couldn't remove were: intel_agp agpgart usbcore ext3 jbd ata_piix libata sd_mod scsi_mod. Sadly, 2.6.20-rc6 is no better than the others; it suspended but hung solid when resuming. The dependency on intel_agp and agpgart is new; I noticed something in the config help about changes in text consoles that "improved" acceleration when available, and I left the text console configuation at its default, which might have dragged in the AGP. In any case, we know that in older kernels the video is irrelevant, so likely intel_agp is without blame in 2.6.20 also. If I could remove SCSI I'd do it in a flash, but you can't test anything without the root filesystem being mounted, and this absolutely requires ata_piix and its various dependencies (scsi_mod and friends). The reason I was monkeying with the rescue system (both 10.1 and 10.2) was that neither the hard disc nor the CDROM would be mounted, so I could in theory force out SCSI. Not a chance: kernel OOPS. Which is probably a clue to the mystery. Do you know how to boot the rescue system but to suppress udev/hotplug/coldplug loading modules for every broken device on the machine? I'd like to be able to say something like: I booted it with no modules at all and it could suspend and resume, but as soon as I loaded {scsi_mod, sr_mod, ata_piix, whichever} it started hanging. This bug is now titled "regression from 0.21 to 0.25" but I'm beginning to suspect that up to 0.21 it was broken the same way but cryptically. Suppose there's a pointer to freed memory, and one of our favorite drivers stores through that pointer. If the memory was free at the time there would be no visible effect -- but if some irrelevant and non-blameworthy event, such as a change in the order or speed of loading drivers, caused the memory to be in use, the consequence would be bad, or worse. We should remember that my laptop (Dell Inspiron 6000d) is two years old, but I'm seeing similar behavior on a brand-new Dell Dimension E520 and also a E510. In other words, possibly a generic chipset issue when the Intel 82801FB{M} (ICH6) SATA Controller gets into the act. The E510 now has Windows Media Center, and I'd like to put Linux and MythTV on it, for which sleep/alarm clock will be important. Let's keep things simple by working on the laptop, but this one laptop is not the only market segment that the fix [might] apply to. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #22 from pavel@novell.com 2007-01-31 06:15 MST ------- If you want to disable udev (etc), just boot with init=/bin/bash . Could you apply the beeping patch, to verify if resume at least hits the kernel entry point? http://lkml.org/lkml/2006/9/10/123 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #23 from jimc@math.ucla.edu 2007-01-31 22:35 MST ------- First the bad news: init=/bin/bash is ineffective because /bin/bash has to be read from the boot disc, whose module must therefore be loaded. Now the good news: I hacked /init in the initrd for kernel 2.6.20-rc6 to exec the initrd's copy of bash instead of starting udev, resulting in a system on which no modules had ever been loaded, and enough function to test suspending. (Note for people searching this archive: to create the new initrd, search for "cpio" in /sbin/mkinitrd. In particular, cpio must have the "-H newc" switch, otherwise the kernel will not recognize it as a initramfs and will choke on it.) Now more bad news: echo mem > /sys/power/state; the symptoms are identical as before, i.e. suspends, but freezes solid upon resuming. This shows that however wacko the SCSI subsystem may be, there is no fault in ata_piix or scsi_mod or friends causing problems suspending. Would the "beeping patch" help make any more progress? Do you have a URL for this patch? Any other suggestions? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #24 from pavel@novell.com 2007-02-02 13:33 MST ------- Beeping patch is at http://lkml.org/lkml/2006/9/10/123 . My only other idea is to try to play with noapic and similar stuff... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #25 from jimc@math.ucla.edu 2007-02-04 14:03 MST ------- Per instructions I recompiled the 2.6.20-rc6 kernel with PM_DEBUG and PM_TRACE (in power management config section), booted with the sabotaged initrd and no modules loaded, and "echo 1 > /sys/power/pm_trace", then "echo mem > /sys/power/state". As before, it suspended, then on resume it hung solid. But I power cycled it and rebooted with kernel 2.6.20-rc6 and a functioning initrd. The discs were said to have not been checked for 130 years :-) but in the boot messages (/var/log/boot.msg) a clue was revealed: <4> Magic number: 0:682:215 <4> hash matches /home/kernel/linux-2.6.19/drivers/base/power/resume.c:28 <4> hash matches device 0000:03:01.0 The device is the CardBus bridge: Ricoh Co Ltd RL5c476 II (rev b3) (PCI ID 1180:0476). No card was in the socket and the driver had never been loaded (would have been yenta_socket.ko). The same chip has a Firewire controller and a flash memory reader; no devices or media were plugged in. This is neither the first nor last device group (chip) on the PCI bus, ordering by bus address as displayed by lspci. I've attached the output of lspci. The code is resume_device(struct device *dev). None of the dev_dbg or dev_err messages were visible (the display's lamp was not yet turned on). Comments for dpm_power_up indicate that some devices must be powered off/on with interrupts disabled. Could this be one of them? How could I make that happen? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #26 from jimc@math.ucla.edu 2007-02-04 14:06 MST ------- Created an attachment (id=117263) --> (https://bugzilla.novell.com/attachment.cgi?id=117263&action=view) Output of lspci on failing machine -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #27 from pavel@novell.com 2007-02-06 16:31 MST ------- Devices like apic are indeed powered on with interrupts disabled... PM_TRACE() should be effective there, too... Are you sure there are no drivers for Cardbus bridge? That is strange... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=211511 ------- Comment #28 from jimc@math.ucla.edu 2007-02-06 19:22 MST ------- Yes, not a single driver module was loaded at the time, and there were supposed to have been no drivers compiled into the kernel either except non-modular ones such as the serio driver. I dug around in the sources, and came to the conclusion that on an Intel chipset the devices are powered off and on using an ACPI BIOS call. It turns out that there is a new BIOS (A09) for my machine, but the list of affected issues was totally irrelevant, things like a setup item for more disc acoustic choices. Nonetheless I flashed my BIOS... and suspend-to-RAM started working! 25 successive suspend events were done with no errors, on 2 different days. Corrupt BIOS? Error in ACPI, not on the BIOS issues list but fixed nonetheless? We'll probably never know what the problem was :-( Nor whether it will fail again in the future :-( By the way, the testing was done with all drivers active including ATI's fglrx version 8.25.18 (3D accelerated graphics for Radeon X300). Pavel, I'm not sure of the etiquette of setting the bug status, so would you please close it? Thank you for all your help, and for your contributions to the software and disc suspend code. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
participants (1)
-
bugzilla_noreply@novell.com