[Bug 239101] New: Fans do not (re)start on hp compaq nx6325 laptop
https://bugzilla.novell.com/show_bug.cgi?id=239101 Summary: Fans do not (re)start on hp compaq nx6325 laptop Product: openSUSE 10.2 Version: Final Platform: i686 OS/Version: Other Status: NEW Severity: Critical Priority: P5 - None Component: Basesystem AssignedTo: trenn@novell.com ReportedBy: bernhard.bender@web.de QAContact: qa@suse.de Hi Thomas, as you requested in comment 119 bug 179702, I am opening this new bug: I am using your 2.6.18.5-34-default kernel to solve some ACPI problems with my hp nx 6325 Turion X2 laptop. A problem with the fans still remains: The ACPI subsystem is unable to start any fan (it does succeed to stop them during boot). I can see messages like this in dmesg:
ACPI: Transitioning device [C352] to D0 ACPI: Transitioning device [C352] to D0 ACPI: Unable to turn cooling device [dffde9b4] 'on' ACPI: Transitioning device [C351] to D0 ACPI: Transitioning device [C351] to D0 ACPI: Unable to turn cooling device [dffdea04] 'on' ACPI: Transitioning device [C350] to D0 ACPI: Transitioning device [C350] to D0 ACPI: Unable to turn cooling device [dffdea54] 'on' ACPI: Transitioning device [C34F] to D0 ACPI: Transitioning device [C34F] to D0 ACPI: Unable to turn cooling device [dffdeab8] 'on' <<<
The BIOS on this machine seems to be the latest version (F.04) Bernhard -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #1 from bernhard.bender@web.de 2007-01-25 16:49 MST ------- Created an attachment (id=115286) --> (https://bugzilla.novell.com/attachment.cgi?id=115286&action=view) output of acpidump This file has the output of acpidump on my nx6325 laptop -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #2 from bernhard.bender@web.de 2007-01-25 16:53 MST ------- I found a blog page (in german) that has some hints regarding fan control on the nx6325. Maybe this is helpful here... http://thinksilicon.redprohosting.de/index.php?page=39#id101 I have not tried this myself yet. Bernhard -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rjwysocki@sisk.pl, luming.yu@intel.com Status|NEW |ASSIGNED ------- Comment #3 from trenn@novell.com 2007-02-10 14:07 MST ------- I can reproduce this issue (on a nx6125, lowest fan sometimes is not controlled correctly. I did so much reboots, not sure whether this could still have to do with the "Bad BIOS state" problem). There also is a "fans break after suspend to disk" issue for sure. There are also some patches available, I try to find a suitable one that has low risk to break other machines. Do you know how to compile a kernel-source.rpm SUSE kernel and how to apply patches? If you could help testing this probably could speed up things. Ahh Rafael has a novell bugzilla account, I didn't know/remember, that's great. Hmm, maybe we should concentrate the discussion here: http://bugzilla.kernel.org/show_bug.cgi?id=7122 If there is a suitable patch for 10.2/SLE10, I will add a comment here and close this one as soon as it is in. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #4 from bernhard.bender@web.de 2007-02-10 14:39 MST ------- Thomas, I will be able to apply patches and compile/test kernels if you can provide me with the necessary links to kernel sources and patches as well as suitable configs. Bernhard -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #5 from rjwysocki@sisk.pl 2007-02-10 18:22 MST ------- There is a series of patches at http://www.sisk.pl/kernel/patches/2.6.20/ that I use on HPC nx6325 on top of the 2.6.20 kernel. Some of them are needed for the thermal management to work correctly, the others are needed for the bcm43xx driver. The patches that aren't mine contain the "Original location" line pointing to the place where you can find more information about the patch in question. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #6 from bernhard.bender@web.de 2007-02-11 04:36 MST ------- Does the 2.6.20 kernel or the patch series contain the necessary fixes for the "ACPI bad state" problem related to the psmouse module (bug 179702) ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #7 from rjwysocki@sisk.pl 2007-02-11 05:33 MST ------- I think so. The last two patches in the series at http://www.sisk.pl/kernel/patches/2.6.20/ are likely to fix it: psmouse-fiddle-with-reset.patch serio-cleanup-to-bus.patch -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #8 from bernhard.bender@web.de 2007-02-11 16:34 MST ------- I built and installed linux-2.6.20 with the following patches included: psmouse-fiddle-with-reset.patch serio-cleanup-to-bus.patch ACPI-notify-revised.patch fan-problem-fix.patch I cloned the kernel config by copying if from /boot and using "make oldconfig" command. The "ACPI bad state" problem seems fixed; at least the AC adapter is correctly recognized which is wasn't with the original 10.2 kernel. The fans do behave differently now: It seem that the lowest level fan is alway active, regardless of temperature. Before, the fan would be off completely. With temperature going up, acpi -V shows Thermal 1: active[2], 50.0 degrees C Thermal 2: ok, 51.0 degrees C Thermal 3: ok, 31.0 degrees C But fans do not seem to follow, only Thermal 1: "active[3]" and "ok" seem to produce any difference in fan state at /proc/acpi/fan/*/state, but no noticeable difference in the fan's actual speed. This is better than before, since the CPU seems to remain cooler, but not perfect yet. Any other patches I could try? Note that I am not going thru suspend, just booting the machine. BTW: I see a couple of "APIC error on CPU0: 40(40)" messages in the syslog for both CPUs. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #9 from rjwysocki@sisk.pl 2007-02-11 17:36 MST ------- I assume the behavior of fans you describe is right after a fresh boot. Can you please post the output of "cat /proc/acpi/thermal_zone/TZ*/temperature /proc/acpi/fan/C3*/state"? Can you run a CPU-intensive task (preferably twice in parallel), run "watch cat /proc/acpi/thermal_zone/TZ*/temperature /proc/acpi/fan/C3*/state" and observe what happens to the temperatures and fans? The "APIC error ..." message happens to me too, but it doesn't seem to correspond to anything really wrong. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #10 from bernhard.bender@web.de 2007-02-12 08:44 MST ------- Fan control is definitely broken still. Right now the machine says:
ber@mel-mobil:~> cat /proc/acpi/thermal_zone/TZ*/temperature /proc/acpi/fan/C3*/state temperature: 70 C temperature: 58 C temperature: 30 C status: off status: off status: off status: off ber@mel-mobil:~> acpi -V Battery 1: charged, 100% Thermal 1: active[1], 70.0 degrees C Thermal 2: ok, 58.0 degrees C Thermal 3: ok, 30.0 degrees C AC Adapter 1: on-line <<<
An the fan(s) is not physically active. It seems to me that once a fan state is set to "off", it cannot be turned back on again. I also found this in dmesg:
ACPI: Transitioning device [C352] to D0 ACPI: Transitioning device [C352] to D0 ACPI: Unable to turn cooling device [c17e69b4] 'on' <<<
Are there any useful APCI debug options in the kernel that I could enable to get more info? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #11 from rjwysocki@sisk.pl 2007-02-12 13:59 MST ------- Well, that certainly is different to what I observe. Pleas try to apply acpi-suspend-resume.patch from the series. (Actually, you could apply all of them. They don't break my system, so I think they wouldn't break yours. ;-)) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #12 from bernhard.bender@web.de 2007-02-12 17:14 MST ------- Okay, in addition to the patches mentioned in comment 8 I have applied the following: acpi-suspend-resume.patch move_GPEs_disabling_to_sleep_prepare.patch call_acpi_sleep_init_from_acpi_init.patch pm-change-suspend-to-RAM-ordering.patch However, nothing has changed. ACPI still seem to be unable to turn fan level on. It does turn the off during boot when the machine is still cold. But the do not come on again once they have been turned off. If I reboot the machine when it is already warm, fans remain on, but turn off after some time... When I run a heave workload on the machine (e.g. linux kernel build) temperature rises to >80 C, then some kind of emergency device turns on the fan full power until it cools down again to ~70 C. At the same time, fans remain off in there /proc/acpi/fan/*/state. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
cat /proc/acpi/thermal_zone/TZ1/trip_points critical (S5): 105 C
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #13 from bernhard.bender@web.de 2007-02-12 17:16 MST ------- Additional info: passive: 95 C: tc1=1 tc2=2 tsp=100 devices=0xdfcc3338 0xdfcc3324 active[0]: 75 C: devices=0xc17e6ab8 active[1]: 65 C: devices=0xc17e6a54 active[2]: 55 C: devices=0xc17e6a04 active[3]: 45 C: devices=0xc17e69b4 Should these device identifiers in some way be related (or identical) to the /proc/acpi/fan/* directories? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #14 from bernhard.bender@web.de 2007-02-12 17:30 MST ------- Well, I may have to eat my words... After some more testing I have seen it activate some fan levels. Hoever, there seen to be some considerable delay involved. possible it simply takes way too long to turn on the fans. I have definitely seen a >10 seconds delay in displaying temperature changes using the "watch acpi -V" command. Wasn't this supposed to be fixed by the new acpi workqueue in ACPI-notify-revised.patch ? Right now I can see: temperature: 55 C temperature: 53 C temperature: 26 C status: off status: off status: on status: off So there is at least some control over the fans. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #15 from trenn@novell.com 2007-02-13 01:53 MST ------- These two patches could help a bit(also on Rafaels list): patches.fixes/acpi-power-resources-resume-fix-2.patch http://bugzilla.kernel.org/show_bug.cgi?id=7122#c52 patches.fixes/acpi_fan-problem-fix.patch http://bugzilla.kernel.org/show_bug.cgi?id=7570#c8 But I could easily get the fans out of sync and misbehave by (un-)reloading thermal and/or fan module. So this is not fixed up correctly in IMO and therefore I did not add them yet, but try to find a cleaner solution (if possible, the fans controlled via thermal over power resources seem to be somehow tricky). This is also what I believe why fans may break after suspend. If fan/power/thermal subsystem would initialise correctly (on module load or resume time), this patch (which is much too risky for backporting): move_GPEs_disabling_to_sleep_prepare.patch Would not be needed. I hope to get an nx6xxx series model again today and try some more.
Should these device identifiers in some way be related (or identical) to the /proc/acpi/fan/* directories?
Yes they should. For each active trip point exeeded, one fan must switch into the active state. Best you use: watch -n1 cat /proc/acpi/thermal_zone/*/{temperature,trip_points} /proc/acpi/fan/state (monitoring the first thermal_zone should be enough as it should have the only active trip points AFAIK). Then (in-)decrease CPU load on another console (e.g. by cat /dev/zero >/dev/null &) and you should see the fans going on and off.
However, there seen to be some considerable delay involved
Check: cat /proc/acpi/thermal_zone/*/polling_frequency This is a value in seconds how often the kernel should check and adjust thermal/power devices. If set on zero (default) they are only checked when a thermal event happens, that might be what you see. The powersaved if started normally sets this to a value of "2", which is a bit too low, I'd go for 10 secs or so. You can manually echo the values in there, if you load modules and set up things manually. The powersaved config variable is here: /etc/sysconfig/powersave/thermal -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #16 from trenn@novell.com 2007-02-13 09:46 MST ------- I did some more testing...: If you comment out the check, whether the fan should already be enabled and let the fan be set active on each thermal check, you should get stable working fans: in drivers/acpi/thermal.c: /* * Above Threshold? * ---------------- * If not already enabled, turn ON all cooling devices * associated with this active threshold. */ if (active->temperature > maxtemp) tz->state.active_index = i; maxtemp = active->temperature; /* if (active->flags.enabled) continue; */ I tested with also the two patches attached mentioned at beginning of comment #15. This one is really ugly to debug. There seem to be an endless loop processed through ACPI reading thermal data all the time, even if thermal polling frequency (you should increase this one to at least 10) is off. I expect this loop causes the fans to break away after some time, but I could not identify why. Even temperature updates seem to get disable for a while... I don't have a final solution here yet... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #17 from bernhard.bender@web.de 2007-02-13 11:06 MST -------
I tested with also the two patches attached mentioned at beginning of comment #15.
One of them was already in me setup, I will include the other in my next tests (hopefully tonight). I will also test your little patch.
This one is really ugly to debug. There seem to be an endless loop processed through ACPI reading thermal data all the time, even if thermal polling frequency (you should increase this one to at least 10) is off.
There is a discussion of this endless cycle consuming loop in the kernel.org thread you mentioned in comment 3
I expect this loop causes the fans to break away after some time, but I could not identify why. Even temperature updates seem to get disable for a while... I don't have a final solution here yet...
Yes, I have seen temperature updates stalled for ~10 seconds (using watch acpi -V). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #18 from trenn@novell.com 2007-02-13 11:21 MST -------
There is a discussion of this endless cycle consuming loop in the kernel.org thread you mentioned in comment 3 Could you give me a pointer, pls. "Endless" gives too less and "ACPI" too much hits. Are you sure it's on lkml?
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #19 from bernhard.bender@web.de 2007-02-13 11:34 MST ------- You may look here: http://www.sisk.pl/kernel/patches/2.6.20/ACPI-notify-revised.patch
HP nx6125/nx6325/... machines have a _GPE handler with an infinite loop sending Notify() events to different ACPI subsystems.
Notify handler in ACPI driver is a C-routine, which may call ACPI interpreter again to get access to some ACPI variables (acpi_evaluate_xxx). On these HP machines such an evaluation changes state of some variable and lets the loop above break. <<< and here: http://bugzilla.kernel.org/show_bug.cgi?id=5534 http://bugzilla.kernel.org/show_bug.cgi?id=7122 I hope this helps. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #20 from bernhard.bender@web.de 2007-02-13 16:03 MST ------- So I did a new test with the 2.6.20 kernel with these patches applied: ACPI-notify-revised.patch acpi-power-resources-resume-fix-2.patch acpi-suspend-resume.patch call_acpi_sleep_init_from_acpi_init.patch fan-problem-fix.patch move_GPEs_disabling_to_sleep_prepare.patch pm-change-suspend-to-RAM-ordering.patch psmouse-fiddle-with-reset.patch serio-cleanup-to-bus.patch and also the modification suggested in comment 16 The result is good: Fans get enabled and disabled correctly according to thermal trip points and actual temperatures. However, I still observe stalled temperature updates for quite long periods (>30 seconds). At least we are making some progress! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #21 from bernhard.bender@web.de 2007-02-13 16:19 MST ------- Changed /proc/acpi/thermal_zone/TZ1/polling_frequency from 2 to 10 seconds This seems to prevent the stalling of thermal reading, or at least it isn't noticeable anymore. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |seife@novell.com ------- Comment #22 from trenn@novell.com 2007-02-14 08:43 MST ------- Seife, FYI: here a proof that increasing polling frequency is a good idea... I am still fiddling with this... my small patch to always write to fan/power resource when in active mode, did not help here, the fans broke away after suspend and were not switched on again. I give it a last try, maybe this is because of the psmouse phenomenon that possibly confused the machine on suspend, will do some more tries... It seems to work now. With the patch 1+4 from/described in the last comments here: http://bugzilla.kernel.org/show_bug.cgi?id=7689. It really seem to be true that psmouse needs to get properly unloaded on suspend, then the fans are all in "off" mode, but on the next thermal polling read they get enabled (with my little patch). This was again with the power/fan fix patches. Hopefully I can backport something safe with this, switching to SP1 again for now, then help mainline again... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #23 from rjwysocki@sisk.pl 2007-02-14 13:47 MST ------- Ah, there's one thing I forgot about. I don't load processor, thermal and fan modules from the initrd. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #24 from bernhard.bender@web.de 2007-02-14 16:56 MST ------- FYI: I started playing with S2disk using the kernel described in comment 20 Fans keep working fine after supend2disk and resume, I can watch them turn on and off as load/temperature increases or decreases. However, I noticed that after resume, one of two CPU in this system was deactivated... (/proc/cpuinfo shows them both; the system has an AMD Turion64 X2) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #25 from rjwysocki@sisk.pl 2007-02-15 00:35 MST ------- The nonboot CPUs are switched off during the suspend and on durng the resume. Why do you think the second core is not active after the resume? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #26 from bernhard.bender@web.de 2007-02-15 02:23 MST -------
Why do you think the second core is not active after the resume?
Its because kpowersave's status display: after the original boot, it shows the clock for both CPUs (changing 800/1600MHz) after resume, it shows clock for CPU_0 and "CPU inactive" or similar for CPU_1 Possibly this is a bug in kpowersave; where do I get the kernel's view of which CPUs are active? (/proc/cpuinfo shows no difference afaik) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #27 from rjwysocki@sisk.pl 2007-02-15 02:32 MST ------- /proc/cpuinfo is a good place to look at. Additionally, you can run "top" and press "1" to see if there's any load on the second core. You can also run gkrellm or a similar system monitor and see how much load is there on each core. Anyway, on my system kpowersave shows both CPUs as "active" (with 2.6.20-git10, but it was like that with 2.6.20 too). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dkukawka@novell.com ------- Comment #28 from trenn@novell.com 2007-02-15 02:50 MST ------- You can check in e.g.: cat /sys/devices/system/cpu/cpu1/online 1 means the cpu is active. CPU 0 cannot be turned off and it seems therefore the online file cpu0/online got removed (maybe userspace progs got confused by this?). You can manually echo 0/1 into the file(s) to online/offline CPU cores for testing. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #29 from trenn@novell.com 2007-02-15 02:51 MST ------- Danny have you already heard about something similar as described in comment #26? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #30 from dkukawka@novell.com 2007-02-15 03:26 MST ------- No, I didn't hear about such a problem before. Was the KPowersave info dialog opend while suspend? Can you reproduce the problem with this KPowersave version: http://beta1.suse.com/private/dkukawka/kpowersave/0.7.2RC6/ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #31 from trenn@novell.com 2007-02-15 03:46 MST ------- About comment #30: I'd check the /sys/devices/../online file first. This is quite easy to do and might lead to a kernel problem (or a problem of the suspend scripts, not sure whether the kernel or the new suspend scripts should online the cpu again). I wonder whether these: patches.fixes/acpi-power-resources-resume-fix-2.patch http://bugzilla.kernel.org/show_bug.cgi?id=7122#c52 patches.fixes/acpi_fan-problem-fix.patch http://bugzilla.kernel.org/show_bug.cgi?id=7570#c8 are really needed or whether we still saw side-effects from the psmouse problem. Bernhard, if you have done or are doing any tests without these applied on a kernel with the psmouse properly cleaned up, I am very interested whether the fans still break away... Or whether you can confirm that these patches really fix anything even with psmouse cleaning up everything on shutdown... Bernhard (and of course also Rafael): Let me thank you at this point for the great reports and tests you are providing here. I know how much time and work this is (doing the same here) and it's very much appreciated. It's particularly hard to find a small set of patches/code that is necessary to fix this properly in this case because of the various (and especially this weird psmouse) bug(s). However, it's really worth it as I think we are (or shortly before) the great break through of getting these HPs finally running really fine with ACPI. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #32 from rjwysocki@sisk.pl 2007-02-15 04:23 MST -------
[...] not sure whether the kernel or the new suspend scripts should online the cpu again [...]
The kernel. And it works for me just fine on nx6325, so there must be a difference somewhere. Perhaps the kernel configuration. Bernhard, you can find my kernel config and /etc/sysconfig/kernel at: http://www.sisk.pl/kernel/boxes/HPC_nx6325/2.6.20/ Could you please use the kernel-config as .config to build the kernel and copy sysconfig-kernel to /etc/sysconfig/kernel, create the initrd with the help of it and see if the problem remains? I think acpi-power-resources-resume-fix-2.patch is needed. The other one is just a cleanup AFAICS. Also the psmouse problems should be fixed by psmouse-fiddle-with-reset.patch and serio-cleanup-to-bus.patch . At least they are fixed by these patches on my box. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #33 from trenn@novell.com 2007-02-15 07:07 MST ------- latest head kernel (2.6.20) available in some hours (or in a day) from here: ftp.suse.com/pub/projects/kernel/kotd/x86_64/HEAD/ should have fans fixed. I added: acpi-power-resources-resume-fix-2.patch acpi-suspend-resume.patch and the latest psmouse cleanups on shutdown and suspend. I also workarounded the break away of fans after suspend by always switching fan on when it should be active. Instead of using these(which are IMO too intrusive and should pass -mm for a while first): move_GPEs_disabling_to_sleep_prepare.patch call_acpi_sleep_init_from_acpi_init.patch pm-change-suspend-to-RAM-ordering.patch I will now concentrate on mainline again, it's time all this sees some broader testing and hits mainline kernel as soon as possible. I set this on fixed now. I may pick up one or the other patch for 10.2, but I have to be sure those are really, really safe and cannot break any other machines. So you might still see issues on 10.2 update kernel. Don't hesitate to add further comments about still open fan issues or reports about (non-)working kernel or unactive core on suspend. For the latter we should open up a new bug to not mix up information here. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #34 from rjwysocki@sisk.pl 2007-02-15 14:11 MST ------- FYI, pm-change-suspend-to-RAM-ordering.patch is a part of a bigger series of patches and it really shouldn't be applied separately. The whole series has already been merged with the mainline. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #35 from bernhard.bender@web.de 2007-02-15 15:18 MST ------- I looked into the CPU1-deactivated problem again: kpowersave-0.7.2RC6-0.1 shows the same. However, both CPUs pick up workload as shown by top command. Looking at /sys/devices/system/cpu/cpu* I discovered that the cpufreq/ subtree is missing for CPU1 I guess this is the reason for kpowersave to report CPU1 as deactivated. This may well be caused by the particular set of patches I applied (I left out the swsusp-* patches). If someone could check this with a different kernel... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #36 from trenn@novell.com 2007-02-15 15:26 MST ------- Created an attachment (id=119540) --> (https://bugzilla.novell.com/attachment.cgi?id=119540&action=view) link symlinked cpufreq directories again after onlining HT/DC Core I think this fixes it (not sure whether it still cleanly patches, it should, it's not that old). I sent this out to cpufreq list a while ago. You are the first who sees a bad side effect on the missing link. I will repost again. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #37 from bernhard.bender@web.de 2007-02-15 17:10 MST ------- Thomas, your patch did not help. CPU1 still is shown as disable (though working) and missing symlink. After all, this is not about offlining the main CPU, right, it is the "managed" CPU1 that looses the symlink. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #38 from bernhard.bender@web.de 2007-02-15 17:39 MST ------- Next test, I also added in patch http://www.sisk.pl/kernel/patches/2.6.20/swsusp-interface-disable-nonboot-cp... but again, problem does not go away. Here is part of dmesg output after resume: Disabling non-boot CPUs ... Cannot set affinity for irq 0 CPU 1 is now offline SMP alternatives: switching to UP code CPU1 is down swsusp: critical section: swsusp: Need to copy 95666 pages Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. powernow-k8: ph2 null fid transition 0x8 Enabling non-boot CPUs ... SMP alternatives: switching to SMP code Booting processor 1/1 eip 3000 Initializing CPU#1 Calibrating delay using timer specific routine.. 3192.24 BogoMIPS (lpj=6384482) CPU: Vendor unknown, using generic init. CPU: Your system may be unstable. CPU: After generic identify, caps: 178bfbff ebd3fbff 00000000 00000000 00002001 00000000 0000001f CPU: After all inits, caps: 178bfbff ebd3fbff 00000000 00000000 00002001 00000000 0000001f CPU1: AuthenticAMD AMD Turion(tm) 64 X2 Mobile Technology TL-52 stepping 02 CPU1 is up -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #39 from bernhard.bender@web.de 2007-02-15 18:22 MST ------- (In reply to comment #33)
latest head kernel (2.6.20) available in some hours (or in a day) from here: ftp.suse.com/pub/projects/kernel/kotd/x86_64/HEAD/
Tested kotd/i386/HEAD/kernel-default-2.6.20-20070215134239.i586.rpm For me this worked identical to the 2.6.20 kernel I patched myself as described above: - fans work Ok (also after suspend2disk and resume) - no ACPI "bad state": AC adapter detected correctly, etc. - still same problem with kpowersave showing CPU1 as disabled after resume -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #40 from rjwysocki@sisk.pl 2007-02-16 02:20 MST ------- Which cpufreq governor do you use? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #41 from trenn@novell.com 2007-02-16 02:57 MST ------- I also have an nx6325 here and tried with the latest 2.6.20 SUSE kernel (on a SLED10 installation). Offlining one core: echo 0 >/sys/devices/system/../cpu1/online echo 1 >/sys/devices/system/../cpu1/online works fine and the linked /sys/devices/system/../cpu1/cpufreq is there again. I did a suspend: powersave -U (in runlevel 3) works fine and the linked /sys/devices/system/../cpu1/cpufreq is there again. I tried to suspend using gnome power manager (gnome got installed here...) and I still have the linked ../cpu1/cpufreq directory. It looks like something goes wrong in userspace with 10.2? There we do more things in userspace using /usr/sbin/s2disk. Bernhard, could you open a new bug for this issue, pls. We are mixing up information here. You might want to take seife@novell.com and dkukawka@novell.com into CC. ahh comment #35:
(I left out the swsusp-* patches). I expect it's this.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #42 from rjwysocki@sisk.pl 2007-02-16 03:11 MST ------- I use OpenSUSE 10.2 with s2disk (from the current suspend.sf.net CVS) on an nx6325 and there are no problems with onlining the second core during the resume. Anyway, the offlining and onlining of the nonboot CPUs is carried out by the kernel and the userland tools have nothing to do with it (I wrote them, so I know :-)). This is a _kernel_ problem and most probably cpufreq-related one. That's why I asked Bernhard which cpufreq governor he uses. Bernhard, if you create the new bug for this issue, please add rjwysocki@sisk.pl to the CC list. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #43 from bernhard.bender@web.de 2007-02-16 15:36 MST ------- Created new entry: bug 246525 for the cpufreq problem -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #44 from rjwysocki@sisk.pl 2007-02-16 17:04 MST ------- FYI, there has been some activity at http://bugzilla.kernel.org/show_bug.cgi?id=5534 . There are two new patches in there that are meant to replace ACPI-notify-revised.patch from my series: http://bugzilla.kernel.org/attachment.cgi?id=10429&action=view http://bugzilla.kernel.org/attachment.cgi?id=10430&action=view I've tested them and they work for me. Also, they have been included in the acpi-test tree, so it looks like they are going to be merged. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #45 from trenn@novell.com 2007-02-17 01:39 MST ------- Yes I saw it, thanks. The problem with those is, that they are too risky to backport. I need to wait until they hit mainline, get broad testing and get integrated into 10.3. I also think (even someone mentioned an nx6325 on this bug IIRC) that the nx6325 is not effected by the endless loop. I could not run into this problem with such a machine here. It looks like a nx6125 (could reproduce this), nx6115 and nx6120 (possibly others) specific bug or it's just harder to hit the endless loop on these. If I find the time the next days I try to identify the loop in AML and try to propose a fix to HP. If it's not that intrusive, maybe they pick it up for their next BIOS release. HP is at least (compared to other laptop vendors) a bit interested in linux. I could imagine that is also the reason why they stick to the ACPI specs and make that much use of it to stay compatible, but now get punished for the ACPI linux code not being ready. Fortunately (after this weird psmouse thing was found) things are moving on and getting stable... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #46 from bernhard.bender@web.de 2007-04-03 17:00 MST ------- Just tested this with the latest 2.6.20.4 (vanilla) kernel tonight. Fan run fine on my hp nx6325 with this kernel after booting the system. However, after suspend2disk/resume, fan control remains broken. Applying the modification from comment 16 did not help (after suspend). Some of the patches may still need to be applied. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=239101 ------- Comment #47 from trenn@novell.com 2007-04-04 02:27 MST ------- Yes some patches went in 2.6.21-rcX, also see: http://bugzilla.kernel.org/show_bug.cgi?id=7122 They won't get backported to Stable 2.6.20.X vanilla kernels as they are too intrusive. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
participants (1)
-
bugzilla_noreply@novell.com