https://bugzilla.novell.com/show_bug.cgi?id=235197 ------- Comment #6 from bk@novell.com 2007-01-19 13:19 MST ------- Thank you very much for your plaudit! I want to give it back to you too. It seems tg3 isn't the real cause, only a rather reliable trigger. I now saw that sometimes suspend even works with tg3 loaded, but I cannot reproduce these occasions. BUT... I can trigger the hang now also without tg3: I have enabled many dev_dbg() debug printks in drivers/power/resume.c by defining DEBUG at the top of it and I'm triggering the hang now reliably without tg3 when I set the console loglevel to 8 so that all of them are printed on the framebuffer console. Reducing the console loglevel to 7 makes suspend work again, unless I load tg3 and run "ifconfig eth0 up" to cause that tg3_suspend and tg3_resume actually do something. Looking at tg3_set_power_state() which is (I assume needlessly) called even two times (from the logged messages) below tg3_resume, I saw many udelay's, and even a few msleep()s and a few msleep_interruptible() somewhere else in tg3.c. Subjectively, I then felt that tg3_resume took even a noticealble time between the printk on enty and the printk on exit was printed. I then reduced the time and even removed some of the udelay() calls and turned some msleep() calls into udelay() calls, and removed one of the two calls to tg3_set_power_state() inside tg3_resume and now the suspend does not hang on the first try after boot, and rcnetwork start starts the network and lets me ping a remote host (which even works), but then the driver/card gets into a bad state and I saw an ifdown script hanging but the next successive suspend attempt then hangs, so I apparently did a bit too much. When suspend hangs in may testing, it always hangs after this message: ata2: SATA link down (SStatus 0 SControl 310) At first I didn't take much note if this message because it is printed also during the working suspend cycles. However it's clear to me now, that if that is not followed but this: When suspend works, that message is directly followed by: ata1: SATA link up 1.5 Gbps (SStatus 113 Scontrol 210)" and two more messages until sda is brought back and the data is written to swap. When suspend does not work, I get with console loglevel 9: sd 0:0:0:0 resuming (that's one of the dev_dbg printks in resume.c) PM: writing image. swsusp:free swap pages: 2879160 After 6 mintues, I get: sd 0:0:0:0 timing out on command, waited 360s sd 0:0:0:0 SCSI error: return code = 0x00000028 end_request: I/O error, dev sda, sector 210756814 Read-error on swap-device (8:0:210756822) Saving image data pages (117788 pages) ... 4% and repeats these with an interval of 6 minutes. I captured some screen-shots of the various hangs with my digicam. Scrolling up one page, I also see: sata_sil 0000:00:12.0 resuming So that is called at least, maybe further debug should also go into that area? AFAICS, it seems that the disk SATA link never comes up again on resume when tg3 is loaded and it's eth0 is up, or when I log many debug messages. To the contrary, it is reported to be down directly before the disk should come back to life. So maybe we do hit a SATA timeout which is not recovered? BTW (I guess it's unrelated): For these last tests, I tried using a kernel which is compiled with lockdep checking and suspending it triggers some printks of: lockdep: not fixing up alternatives. It's logged after "CPU 1 is now offline" and after "Enabling non-boot CPUs ..." -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.