[Bug 333043] New: IBM T41p shuts down, powersave, Temperature state changed to critical
https://bugzilla.novell.com/show_bug.cgi?id=333043 Summary: IBM T41p shuts down, powersave, Temperature state changed to critical Product: openSUSE 10.3 Version: Final Platform: x86 OS/Version: openSUSE 10.3 Status: NEW Severity: Major Priority: P5 - None Component: Basesystem AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: robert.simai@novell.com QAContact: qa@suse.de Found By: Novell Technical Services NTS Priority: 1000 Fresh installation of 10.3 KDE. During operation powersave triggers a shutdown of the system. /var/log/warn says powersaved[7863]: WARNING (checkTemperatureStateChanges:209) Temperature state changed to critical. I used robert@cheetah:~> powersave -T Thermal Device no. 0: Temperature: 49 Critical: 93 Passive: 89 to monitored this and the "Temperature" easily exceeds 90 during activity and the state changes to PASSIVE. After some seconds the system shuts down. I encountered this during the online update after installation. Likely the provided kernel update was installed before it happened. It also can be triggered by simply continuously echoing on a console for >15 seconds. I haven't had this before with 10.1, the fan obviously works and the airflow is not blocked. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=333043#c1
--- Comment #1 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c2
--- Comment #2 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c3
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c4
--- Comment #4 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c5
--- Comment #5 from Pavel Machek
https://bugzilla.novell.com/show_bug.cgi?id=333043#c6
--- Comment #6 from Pavel Machek
https://bugzilla.novell.com/show_bug.cgi?id=333043#c7
--- Comment #7 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c8
Pavel Machek
https://bugzilla.novell.com/show_bug.cgi?id=333043#c9
--- Comment #9 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c10
--- Comment #10 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c11
--- Comment #11 from Jean Delvare
I did not load sensors modules or anything else actively.
The vendor-specific ACPI modules are loaded automatically by /etc/rc.d/acpid, this includes thinkpad_acpi.
FYI: I also put the modules from Comment #8 to /etc/init.d/blacklist, hwmon and thinkpad_acpi were loaded anyway.
I remember similar results with other modules loaded by the init scripts. For some reason it seems that blacklisting doesn't work in this case, and I consider it a bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=333043#c12
--- Comment #12 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c13
--- Comment #13 from Pavel Machek
https://bugzilla.novell.com/show_bug.cgi?id=333043#c14
--- Comment #14 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c15
--- Comment #15 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c16
--- Comment #16 from Robert Simai
I tried playing with T42 (or something) around here, and could not push it higher than 80C or so. Thinkpads cooling usually works in my experience.
We tried on a T42p and it also reaches the PASSIVE state, guess where I know from that it generates an ACPI event :-)
Anyway, can you try without ibm_acpi? It may not be related, but it is only major change between 10.2 and 10.3, AFAICT. I'd prefer it to be out of equation.
There's no ibm_acpi loaded. If you meant thinkpad_acpi, I tried without already, see Comment #9. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=333043#c17
Andreas Klein
https://bugzilla.novell.com/show_bug.cgi?id=333043#c18
--- Comment #18 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043
Martin Mrazik
https://bugzilla.novell.com/show_bug.cgi?id=333043#c19
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c20
--- Comment #20 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c21
--- Comment #21 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c22
--- Comment #22 from Robert Simai
Can you provide dmidecode output and describe (maybe with help from ibm/lenovo
I will attach the dmidecode output.
website) how to match these BIOSes (in a bit generic way, e.g. also previous BIOS versions), pls.
Sorry, but I don't know how to do that. The BIOS is available here: http://www-307.ibm.com/pc/support/site.wss/document.do?sitestyle=lenovo&lndocid=MIGR-50273 (In reply to comment #20 from Thomas Renninger)
Hmm, haven't you said the T42 are throwing a passive trip point event?
Yes, sorry for the confusion. T42 shares the same BIOS as T41 but generates an ACPI event for the passive point. Andreas, maybe you like to try this with your R50? Start "acpi_listen" on a console and heat it up. See if there's any output before it shuts down. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=333043#c23
--- Comment #23 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c24
Thomas Renninger
Yes, sorry for the confusion. T42 shares the same BIOS as T41 but generates an ACPI event for the passive point. This is interesting. In this case it may be worth to compare the EC firmware (which you should be able to download from the ibm site and flash separately) between those machines. If you have a look at the dmidecode info you provided, this string states the EC firmware version. Maybe they really fixed it up in the EC firmware (does the T42 also shutdown?):
IBM ThinkPad Embedded Controller -[1RHT71WW-3.04 ]- -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=333043#c25
Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c26
--- Comment #26 from Sebastian Nagel
https://bugzilla.novell.com/show_bug.cgi?id=333043#c27
--- Comment #27 from Sebastian Nagel
https://bugzilla.novell.com/show_bug.cgi?id=333043
Sebastian Nagel
https://bugzilla.novell.com/show_bug.cgi?id=333043
Sebastian Nagel
https://bugzilla.novell.com/show_bug.cgi?id=333043#c28
--- Comment #28 from Pavel Machek
https://bugzilla.novell.com/show_bug.cgi?id=333043#c29
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c30
--- Comment #30 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c31
--- Comment #31 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c32
--- Comment #32 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c33
--- Comment #33 from Robert Simai
Robert, I build a kernel and you can access it internally via: /mounts/work/built/mbuild/stravinsky-trenn-333
I had a look, no RPM yet. Please ping me when available and I will be happy to check it out. (In reply to comment #32 from Thomas Renninger)
- AFAIK we did remove thermal polling with 10.3
Maybe I don't fully understand how this all works but I really wonder, if polling really was removed completely, what throttles the CPU, see this recording: https://bugzilla.novell.com/attachment.cgi?id=178941 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=333043#c34
John Anderson
https://bugzilla.novell.com/show_bug.cgi?id=333043#c35
--- Comment #35 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c37
--- Comment #37 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c39
--- Comment #39 from Jean Delvare
https://bugzilla.novell.com/show_bug.cgi?id=333043#c40
Jean Delvare
https://bugzilla.novell.com/show_bug.cgi?id=333043#c42
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c44
--- Comment #44 from John Anderson
https://bugzilla.novell.com/show_bug.cgi?id=333043#c45
--- Comment #45 from Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043#c46
Thomas Renninger
I tried again with a 1 second interval in polling_frequency and the CPU was quickly and properly throttled 1.7->1.2GHz at 85°C At least one positive part...
Strangely, it takes very long (about 30 seconds) to switch off PASSIVE, even > if the temperature remains <70°C This is normal. A passive polling value kicks in if passive is active. This is a BIOS value TSP (which should be 600, 60*10 secs on a ThinkPad). cat /proc/acpi/thermal_zone/*/trip_points should show you a tc1, tc2 and tsp (thermal sampling period). Those are used for the hysteresis algorithm, gets exported via BIOS and we should not alter them...
I had the test running for 30 Minutes. The point for throttling continuously increased up to 91°C (then throttling was 1.7Ghz->1.0GHz or even ->06GHz). Something seem to be utterly broken..., please give me a bit time (some days, there is also other work...) to go through the code and provide a debug/test kernel... Hmmm, maybe I should try to get such a machine, then compiling of the thermal module and reloading it is enough for testing, instead of rebuilding and rebooting whole kernels...
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=333043#c47
Stephan Kulow
https://bugzilla.novell.com/show_bug.cgi?id=333043#c48
--- Comment #48 from John Anderson
https://bugzilla.novell.com/show_bug.cgi?id=333043#c49
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c50
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c51
--- Comment #51 from Stephan Kulow
https://bugzilla.novell.com/show_bug.cgi?id=333043#c52
--- Comment #52 from Alexey Starikovskiy
https://bugzilla.novell.com/show_bug.cgi?id=333043#c53
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c54
--- Comment #54 from Alexey Starikovskiy
https://bugzilla.novell.com/show_bug.cgi?id=333043#c55
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c56
--- Comment #56 from Thomas Renninger
with a frequency of 7 per second with a frequency of every 7 seconds.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=333043#c57
Robert Simai
This can be tested by latest kotd in SLE103 branch in some days here: ftp://ftp.suse.com/pub/projects/kernel/kotd/SL103_BRANCH/ Watch out for this changelog (e.g. rpm -qp --changelog kernel-xy.rpm |less):
Thu Nov 15 13:56:51 CET 2007 - trenn@suse.de
- patches.arch/acpi_thermal_passive_blacklist.patch: Avoid critical temp shutdowns on specific ThinkPad T4x(p) and R40 (https://bugzilla.novell.com/show_bug.cgi?id=333043).
I've tried with this kernel, including your blacklist patch. Sometimes when putting load on the machine the temperature jumps up to 92°C (critical is 93°C!) before the CPU is throttled. This means this is not safe, you should consider to lower the passive point some degrees more, maybe to 75°C. Besides that, it's definitely an improvement or better to say, makes my machine work as it did before 10.3 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=333043#c58
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043#c59
--- Comment #59 from Stephan Kulow
https://bugzilla.novell.com/show_bug.cgi?id=333043#c60
Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=333043
User antispam@telkomsa.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c61
--- Comment #61 from Niel Lambrechts
https://bugzilla.novell.com/show_bug.cgi?id=333043
User antispam@telkomsa.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c62
Niel Lambrechts
https://bugzilla.novell.com/show_bug.cgi?id=333043
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c63
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c64
--- Comment #64 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c65
--- Comment #65 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c66
--- Comment #66 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043
User antispam@telkomsa.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c67
--- Comment #67 from Niel Lambrechts
https://bugzilla.novell.com/show_bug.cgi?id=333043
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c68
--- Comment #68 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043
User antispam@telkomsa.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c69
Niel Lambrechts
https://bugzilla.novell.com/show_bug.cgi?id=333043
User sontek@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c70
John Anderson
https://bugzilla.novell.com/show_bug.cgi?id=333043
User robert.simai@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c71
Robert Simai
https://bugzilla.novell.com/show_bug.cgi?id=333043
User pavel@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c72
--- Comment #72 from Pavel Machek
https://bugzilla.novell.com/show_bug.cgi?id=333043
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c73
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c74
Thomas Renninger
I can confirm, T42p generates an ACPI event, T41p does not, even with the latest BIOS. I don't have experience with 10.2, it was running 10.1 and this worked.
10.2/10.1 probably worked because we had thermal polling enabled by default (even rather frequent polling because of those IIRC). http://bugzilla.kernel.org/show_bug.cgi?id=10658#c57 proves that there are proprietary applications on Microsoft Windows doing passive cooling even there is no ACPI passive trip point defined at all. -> Fixed for 11.0, I will open yet a new passive trip point discussion mainline, these patches IMO should go in as well. Also the one from Matthew Garret provided in above stated bug is a good idea to make the ACPI thermal management more robust (will add this one to 11.0 also soon). Maybe we should also just enable temperature polling again, even the BIOS does not explicitly tell us to do so. Then we are a bit more "Windows compatible" again. How I hate this term..., let's get some more pre-loads and let the vendors fix their BIOSes. Things like that cannot happen with the latest T61/X61 models... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=333043
User antispam@telkomsa.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c75
--- Comment #75 from Niel Lambrechts
https://bugzilla.novell.com/show_bug.cgi?id=333043
User antispam@telkomsa.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c76
--- Comment #76 from Niel Lambrechts
Interestingly, the 'thermal.psv=80' boot option does NOT fix the problem for me Sorry, simply reducing the passive trip point does not help, you also need to tell the kernel to check for the temperature, also adding a polling interval in 1/10 seconds should help:
https://bugzilla.novell.com/show_bug.cgi?id=333043
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c77
--- Comment #77 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043
User antispam@telkomsa.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c78
--- Comment #78 from Niel Lambrechts
https://bugzilla.novell.com/show_bug.cgi?id=333043
User pavel@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c79
--- Comment #79 from Pavel Machek
I tried using: thermal.psv=78 thermal.tzp=80 Yes, this should do what the patch does. I couldn't see anything obvious why
https://bugzilla.novell.com/show_bug.cgi?id=333043
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c80
--- Comment #80 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=333043
User antispam@telkomsa.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c81
--- Comment #81 from Niel Lambrechts
https://bugzilla.novell.com/show_bug.cgi?id=333043
User daugirdas@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c82
Daugirdas Racys
https://bugzilla.novell.com/show_bug.cgi?id=333043
User meissner@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c83
--- Comment #83 from Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=333043
User fstrba@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c84
Fridrich Strba
https://bugzilla.novell.com/show_bug.cgi?id=333043
User mmeeks@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c85
--- Comment #85 from Michael Meeks
https://bugzilla.novell.com/show_bug.cgi?id=333043
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=333043#c86
Thomas Renninger
participants (1)
-
bugzilla_noreply@novell.com