[Bug 282278] New: System shutdown because of critical temperature (ACPI)
https://bugzilla.novell.com/show_bug.cgi?id=282278 Summary: System shutdown because of critical temperature (ACPI) Product: openSUSE 10.3 Version: Alpha 4 Platform: x86-64 OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: info@tristanhoffmann.de QAContact: qa@suse.de Since openSUSE 10.2 my HP nx6125 regulary shuts down because of a wrong temperature alert from ACPI. /var/log/messages: Jun 7 20:14:04 turion-laptop kernel: ACPI: Critical trip point Jun 7 20:14:04 turion-laptop kernel: Critical temperature reached (7168 C), shutting down. Jun 7 20:14:05 turion-laptop shutdown[5639]: shutting down for system halt Jun 7 20:14:05 turion-laptop init: Switching to runlevel: 0 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=282278#c1
Lars Marowsky-Bree
https://bugzilla.novell.com/show_bug.cgi?id=282278#c2
--- Comment #2 from Michael Crees
https://bugzilla.novell.com/show_bug.cgi?id=282278#c3
Michael Crees
https://bugzilla.novell.com/show_bug.cgi?id=282278#c4
--- Comment #4 from Alexey Starikovskiy
https://bugzilla.novell.com/show_bug.cgi?id=282278#c5
Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=282278#c6
--- Comment #6 from Michael Crees
Your hwinfo shows use of two hwmon drivers, please remove them and check again.
Easier said than done. Disabling lm_sensors prevents the max6650 module from loading, but nothing I can do will stop k8temp from loading. I can, of course, remove it with rmmod, but it's back after the next reboot. I have grepped all of /etc for k8temp trying to find what might be calling it, but apart from the disabled sensors config files and an odd match in the binary etc/alternatives/jre_1.5.0/lib/amd64/server/libjvm.so (which I can't believe is anything other than chance), there isn't anything. At the moment it's showing: rhakios@suse-test:~> cat /proc/acpi/thermal_zone/THRM/temperature temperature: 3 C (It requires an offset of +40C, but that isn't a problem in itself) but it might well go up later, it usually seems to happen after midnight local time. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=282278#c7
--- Comment #7 from Alexey Starikovskiy
https://bugzilla.novell.com/show_bug.cgi?id=282278#c8
--- Comment #8 from Alexey Starikovskiy
https://bugzilla.novell.com/show_bug.cgi?id=282278#c9
--- Comment #9 from Alexey Starikovskiy
https://bugzilla.novell.com/show_bug.cgi?id=282278#c10
--- Comment #10 from Michael Crees
https://bugzilla.novell.com/show_bug.cgi?id=282278#c11
--- Comment #11 from Michael Crees
https://bugzilla.novell.com/show_bug.cgi?id=282278#c12
--- Comment #12 from Alexey Starikovskiy
https://bugzilla.novell.com/show_bug.cgi?id=282278#c13
--- Comment #13 from Michael Crees
Are you sure that none of hwmon modules are loaded?
Yes: rhakios@suse-test:~> /sbin/lsmod | grep -i hwmon rhakios@suse-test:~>
What did you change in kernel so 255 does not trigger shutdown?
Not a thing. It used to shut down with 10.2, but with 10.3 it doesn't. I have no idea why. I did manage to find some logs I had kept from the 10.2 installation, when I was wondering about the cause of the spontaneous shutdown and 255C was listed as the temperature in /var/log/messages then too.
Do you have latest BIOS?
I have saved the best till last. It turns out a new BIOS update was released this month. It has lead to somewhat of a change in the boot messages, which you can find attached. It happens after the message "Activating device mapper..." I'm not sure what to do about this. It boots eventually (I'm posting from it now) and appears to be running okay, doing ctrl+alt+F10 to get to the message console doesn't show any ongoing problems. I suppose I'll leave it running and see what happens to the thermal zone readout later. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=282278#c14
--- Comment #14 from Michael Crees
Well, it seems that you _never_ read the right value...
I should have mentioned, and implied before with the comment about requiring a +40C offset, that it does read a temperature above the baseline. If I make the system work a bit then... rhakios@suse-test:~> cat /proc/acpi/thermal_zone/THRM/temperature temperature: 7 C rhakios@suse-test:~> cat /proc/acpi/thermal_zone/THRM/temperature temperature: 8 C rhakios@suse-test:~> cat /proc/acpi/thermal_zone/THRM/temperature temperature: 8 C rhakios@suse-test:~> cat /proc/acpi/thermal_zone/THRM/temperature temperature: 9 C rhakios@suse-test:~> cat /proc/acpi/thermal_zone/THRM/temperature temperature: 9 C rhakios@suse-test:~> cat /proc/acpi/thermal_zone/THRM/temperature temperature: 10 C -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=282278#c15
--- Comment #15 from Michael Crees
https://bugzilla.novell.com/show_bug.cgi?id=282278#c16
--- Comment #16 from Michael Crees
https://bugzilla.novell.com/show_bug.cgi?id=282278
Alexey Starikovskiy
https://bugzilla.novell.com/show_bug.cgi?id=282278
User astarikovskiy@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=282278#c17
Alexey Starikovskiy
participants (1)
-
bugzilla_noreply@novell.com