[Bug 807312] New: kernel 3.8 crashes (reboots notebook after a few minutes)
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c0 Summary: kernel 3.8 crashes (reboots notebook after a few minutes) Classification: openSUSE Product: openSUSE Factory Version: 12.3 Beta 1 Platform: x86-64 OS/Version: openSUSE 12.2 Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: rainer.klier@gmx.at QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0 i use a HP Compaq 8710w notebook with a Mobile PM965/GM965 chipset. after installing kernel 3.8 (or 3.8.1) and rebooting everything seems normal. but dmesg show thousands of the following messages: [ 282.308114] mei 0000:00:03.0: unexpected reset: dev_state = RESETING if found out that this has to be this device (from lspci -vvv): 00:03.0 Communication controller: Intel Corporation Mobile PM965/GM965 MEI Controller (rev 0c) Subsystem: Hewlett-Packard Company Device 30c3 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 45 Region 0: Memory at e8000000 (64-bit, non-prefetchable) [size=16] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee0300c Data: 4152 Kernel driver in use: mei it seems to be this bug from the redhat bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=917081 but after a while the machine reboots instantly. it doesn't help to remove the mei.ko kernel module or to switch on/off the intel management interface in the bios. with kernel 3.8 or 3.8.1 the notebook crashes/reboots every few minutes. maybe the crash has nothing to do with the MEI Controller. but it seems like that. i don't know. Reproducible: Always Steps to Reproduce: 1. install kernel 3.8 or 3.8.1 on a HP Compaq 8710w notebook 2. reboot and let notebook running 3. after some minutes it crashes/reboots instantly. Actual Results: after some minutes the notebook crashes/reboots instantly. Expected Results: the notebook should not reboot/crash with kernel 3.8 installed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c Rainer Klier <rainer.klier@gmx.at> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High Component|Kernel |Kernel Version|13.1 Beta 1 |Final Product|openSUSE Factory |openSUSE 12.3 Target Milestone|--- |Final OS/Version|openSUSE 12.2 |openSUSE 12.3 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c Rainer Klier <rainer.klier@gmx.at> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P2 - High |P1 - Urgent -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c1 Rainer Klier <rainer.klier@gmx.at> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|kernel 3.8 crashes (reboots |Kernel 3.8 and 3.9 crash |notebook after a few |(reboots notebook after a |minutes) |few minutes) --- Comment #1 from Rainer Klier <rainer.klier@gmx.at> 2013-04-30 13:02:59 UTC --- it also happens now -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c2 --- Comment #2 from Rainer Klier <rainer.klier@gmx.at> 2013-04-30 13:05:24 UTC --- it also happens now with kernel 3.9. :-( the latest stable kernel on this notebook is 3.7.10. it seems not to have anything to do with mei kernel module, because i blacklisted mei kernel module, and this changed nothing. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c3 --- Comment #3 from Rainer Klier <rainer.klier@gmx.at> 2013-05-03 10:26:22 UTC --- jdelvare@suse.com gave me the hint, that it may have something to do with overheating, fan speed and sensors. so i started sensors-detect. this generated /etc/sysconfig/lm_sensors and /usr/lib/systemd/system/lm_sensors.service. then i found bug #810344. so i tried to set up fancontrol: then i tried pwmconfig. but this produced the error "There are no pwm-capable sensor modules installed" and when trying fancontrol, it produced the error "Loading configuration from /etc/fancontrol ... Error: Can't read configuration file". -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c4 --- Comment #4 from Jean Delvare <jdelvare@suse.com> 2013-05-03 10:54:06 UTC --- pwmconfig and fancontrol are meant for systems with hardware monitoring chips driven natively by Linux. On laptops everything is under the control of ACPI, so it is completely expected that they did not work. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c5 --- Comment #5 from Rainer Klier <rainer.klier@gmx.at> 2013-05-03 11:08:01 UTC --- (In reply to comment #4)
pwmconfig and fancontrol are meant for systems with hardware monitoring chips driven natively by Linux. On laptops everything is under the control of ACPI, so it is completely expected that they did not work.
ok, understod. under kernel 3.7 the fan ran most of the time, so it seems obvious that the notebook never overheated under kernel 3.7. but starting with kernel 3.8 the fan does not run always. so, do i have any possibility to force the fan to run? i found /sys/devices/virtual/thermal and /sys/devices/virtual/thermal/thermal_zone0(12345) but all values there are read-only. is there any way to control/force the fan to run? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c6 Jean Delvare <jdelvare@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tiwai@suse.com, | |trenn@suse.com --- Comment #6 from Jean Delvare <jdelvare@suse.com> 2013-05-03 14:30:58 UTC --- I don't know of any way, sorry. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c7 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |rainer.klier@gmx.at --- Comment #7 from Takashi Iwai <tiwai@suse.com> 2013-05-03 14:37:40 UTC --- Could you check the kernel message before reboot via netconsole or such? At least there should be some dying message if it's a "sane" reboot. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c8 --- Comment #8 from Rainer Klier <rainer.klier@gmx.at> 2013-05-03 15:33:17 UTC --- (In reply to comment #7)
Could you check the kernel message before reboot via netconsole or such? At least there should be some dying message if it's a "sane" reboot.
i don't think it's a "sane" reboot. i agree with jean's assumption, that it is a kind of "rescue-reboot/-shutoff" to save the hardware from overheating. it happens immediately with no warning. i already checked /var/log/messages for anything, but found nothing. right now, i use the KDE4 desktop-widget for watching the temperatures and/or xsensors. and i see one of the 6 thermal zones which goes higher than the others. it is called "temp1" in xsensors and it is at about 73°C - 81°C. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c9 --- Comment #9 from Takashi Iwai <tiwai@suse.com> 2013-05-03 15:41:04 UTC --- A rescue reboot by the kernel doing by itself is a sort of "sane" reboot. The reboot is done in the expected way. But this should leave some record in the log, usually. The fact that it's missing implies that something weird happens instead. In such a case, try to login in a single user mode with "nomodeset" boot option so that no KMS is kicked in, and without GUI. Do you still see the problem? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c10 --- Comment #10 from Rainer Klier <rainer.klier@gmx.at> 2013-05-06 06:46:13 UTC --- (In reply to comment #9)
A rescue reboot by the kernel doing by itself is a sort of "sane" reboot. The
ok.
reboot is done in the expected way. But this should leave some record in the log, usually. The fact that it's missing implies that something weird happens instead.
i fear so. the reboot happens suddenly. suddenly the screen goes black and a second later i see the bios-/boot-screen like the notebook was switched on. and there is nothing in /var/log/messages....
In such a case, try to login in a single user mode with "nomodeset" boot option
nomodeset is default "on" on my system. i don't use any KMS driver.
so that no KMS is kicked in, and without GUI. Do you still see the problem?
no. i think, this is because when the system runs in text-only-mode, the notebook doesn't become that hot, so it doesn't reboot. it even "survived" a whole night staying idle in KDE4. i just logged in to KDE4 and then did nothing. this seems to produce so less heat, that it doesn't reboot. but when i work with the notebook, it suddenly reboots after a while. but it seems to be better, after i ran sensors-detect and created /etc/sysconfig/lm_sensors and /usr/lib/systemd/system/lm_sensors.service. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c11 --- Comment #11 from Rainer Klier <rainer.klier@gmx.at> 2013-05-06 07:19:29 UTC --- (In reply to comment #6)
I don't know of any way, sorry.
i found out, that i can control the fan, when i do: echo "1" > /sys/devices/virtual/thermal/cooling_device0[123456789101112]/cur_state but not all enabled cooling_devices start the fan to blow... and the strange thing is, that when i do this to /sys/devices/virtual/thermal/cooling_device2,3,4,5,8,9,10,11 the temperature of /sys/devices/virtual/thermal/thermal_zone3 raises immediatelly. shouldn't the enabling of a cooling_device lower the temperature? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c12 --- Comment #12 from Rainer Klier <rainer.klier@gmx.at> 2013-05-06 07:30:15 UTC --- i found out the types of cooling devices: cooling_device0/type Processor cooling_device1/type Processor cooling_device10/type Fan cooling_device11/type Fan cooling_device12/type Fan cooling_device13/type LCD cooling_device2/type Fan cooling_device3/type Fan cooling_device4/type Fan cooling_device5/type Fan cooling_device6/type Fan cooling_device7/type Fan cooling_device8/type Fan cooling_device9/type Fan so, the strange thing is, that enabling the fans 2,3,4,5,8,9,10,11 raises the temperature of thermal_zone3... i don't understand that... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c13 --- Comment #13 from Rainer Klier <rainer.klier@gmx.at> 2013-05-06 07:35:23 UTC --- the next strange thing is, that thermal_zone3 doesn't have a cooling device associated with it. this is the contents of thermal_zone4: lrwxrwxrwx 1 root root 0 May 6 08:10 cdev0 -> ../cooling_device12 -r--r--r-- 1 root root 4096 May 6 08:10 cdev0_trip_point lrwxrwxrwx 1 root root 0 May 6 08:10 cdev1 -> ../cooling_device11 -r--r--r-- 1 root root 4096 May 6 08:10 cdev1_trip_point lrwxrwxrwx 1 root root 0 May 6 08:10 cdev2 -> ../cooling_device10 -r--r--r-- 1 root root 4096 May 6 08:10 cdev2_trip_point lrwxrwxrwx 1 root root 0 May 6 08:10 cdev3 -> ../cooling_device9 -r--r--r-- 1 root root 4096 May 6 08:10 cdev3_trip_point lrwxrwxrwx 1 root root 0 May 6 08:10 cdev4 -> ../cooling_device8 -r--r--r-- 1 root root 4096 May 6 08:10 cdev4_trip_point lrwxrwxrwx 1 root root 0 May 6 08:10 cdev5 -> ../cooling_device1 -r--r--r-- 1 root root 4096 May 6 08:10 cdev5_trip_point lrwxrwxrwx 1 root root 0 May 6 08:10 cdev6 -> ../cooling_device0 -r--r--r-- 1 root root 4096 May 6 08:10 cdev6_trip_point lrwxrwxrwx 1 root root 0 May 6 08:10 device -> ../../../LNXSYSTM:00/LNXSYBUS:01/LNXTHERM:04 -rw-r--r-- 1 root root 4096 May 6 08:10 mode -rw-r--r-- 1 root root 4096 May 6 08:10 policy drwxr-xr-x 2 root root 0 May 6 08:10 power lrwxrwxrwx 1 root root 0 May 6 08:42 subsystem -> ../../../../class/thermal -r--r--r-- 1 root root 4096 May 6 08:10 temp -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_0_temp -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_0_type -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_1_temp -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_1_type -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_2_temp -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_2_type -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_3_temp -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_3_type -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_4_temp -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_4_type -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_5_temp -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_5_type -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_6_temp -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_6_type -r--r--r-- 1 root root 4096 May 6 08:10 type -rw-r--r-- 1 root root 4096 May 6 08:42 uevent and this is thermal_zone3: lrwxrwxrwx 1 root root 0 May 6 08:10 device -> ../../../LNXSYSTM:00/LNXSYBUS:01/LNXTHERM:03 -rw-r--r-- 1 root root 4096 May 6 08:10 mode -rw-r--r-- 1 root root 4096 May 6 08:10 passive -rw-r--r-- 1 root root 4096 May 6 08:10 policy drwxr-xr-x 2 root root 0 May 6 08:10 power lrwxrwxrwx 1 root root 0 May 6 08:42 subsystem -> ../../../../class/thermal -r--r--r-- 1 root root 4096 May 6 08:10 temp -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_0_temp -r--r--r-- 1 root root 4096 May 6 08:10 trip_point_0_type -r--r--r-- 1 root root 4096 May 6 08:10 type -rw-r--r-- 1 root root 4096 May 6 08:42 uevent shouldn't thermal_zone3 also have some cooling devices? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c14 --- Comment #14 from Rainer Klier <rainer.klier@gmx.at> 2013-05-06 09:05:17 UTC --- i think this bug is related to https://bugzilla.redhat.com/show_bug.cgi?id=895276 although this case is about fans are running at full speed. but i think the reason/origin of the problem is the same. what i don't understand is, that the temperature raises, when the fans are running. IMHO the temperature should drop, when the fans are running... ???? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c15 --- Comment #15 from Jean Delvare <jdelvare@suse.com> 2013-05-06 09:16:45 UTC --- (In reply to comment #10)
In such a case, try to login in a single user mode with "nomodeset" boot so that no KMS is kicked in, and without GUI. Do you still see the problem?
no. i think, this is because when the system runs in text-only-mode, the notebook doesn't become that hot, so it doesn't reboot.
You could run in text mode and run for example "md5sum /dev/zero" to create artificial load. Then maybe you'd be able to see something before it reboots.
but it seems to be better, after i ran sensors-detect and created /etc/sysconfig/lm_sensors and /usr/lib/systemd/system/lm_sensors.service.
For the record, I can't think of any reason why this would make a difference. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c16 --- Comment #16 from Jean Delvare <jdelvare@suse.com> 2013-05-06 09:21:05 UTC --- What I don't get is how you could possibly have 11 fans in a notebook. This looks plain wrong. Could you please compare with what the openSUSE 12.2 kernel said? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c17 --- Comment #17 from Rainer Klier <rainer.klier@gmx.at> 2013-05-06 09:23:29 UTC --- (In reply to comment #15)
(In reply to comment #10)
i think, this is because when the system runs in text-only-mode, the notebook doesn't become that hot, so it doesn't reboot.
You could run in text mode and run for example "md5sum /dev/zero" to create artificial load. Then maybe you'd be able to see something before it reboots.
ok, but if the load produces the reboot, as expected, it doesn't help. it only proves the suspicion. but i will try nevertheless.
but it seems to be better, after i ran sensors-detect and created /etc/sysconfig/lm_sensors and /usr/lib/systemd/system/lm_sensors.service.
For the record, I can't think of any reason why this would make a difference.
ok. what is the kernel module coretemp for? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c18 --- Comment #18 from Rainer Klier <rainer.klier@gmx.at> 2013-05-06 09:25:59 UTC --- (In reply to comment #16)
What I don't get is how you could possibly have 11 fans in a notebook. This looks plain wrong.
yes, maybe. but as you can see in https://bugzilla.redhat.com/show_bug.cgi?id=895276 others also have a similar situation in similar HP notebooks.
Could you please compare with what the openSUSE 12.2 kernel said?
i will try that also. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c19 --- Comment #19 from Jean Delvare <jdelvare@suse.com> 2013-05-06 09:27:50 UTC --- (In reply to comment #17)
what is the kernel module coretemp for?
Temperature reporting to user-space only. No thermal management and no fan control. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c20 --- Comment #20 from Rainer Klier <rainer.klier@gmx.at> 2013-05-06 09:37:23 UTC --- if i echo "1" > /sys/devices/virtual/thermal/cooling_device2/cur_state temperature of thermal_zone5 drops from about 85°C to 60°C, BUT temperature of thermal_zone3 raises to 100°C.... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c21 --- Comment #21 from Jean Delvare <jdelvare@suse.com> 2013-05-06 11:00:35 UTC --- Question #1: do you hear the fan spin when you force-enable cooling_device2? Question #2: does it prevent the spurious reboot? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c22 --- Comment #22 from Rainer Klier <rainer.klier@gmx.at> 2013-05-06 12:07:23 UTC --- (In reply to comment #21)
Question #1: do you hear the fan spin when you force-enable cooling_device2?
when i do this, the first thing, i notice is the raise of the temp of thermal_zone3 from 30°C up to 100°C. (and i didn't let it go higher...) "/sys/devices/virtual/thermalthermal_zone3" corresponds to "temp4" from the output of sensors. the contents of /sys/devices/virtual/thermal/thermal_zone3/device/path is: \_TZ_.TZ5_ /sys/devices/virtual/thermal/thermal_zone3/device/modalias: acpi:LNXTHERM: /sys/devices/virtual/thermal/thermal_zone3/device/hid: LNXTHERM /sys/devices/virtual/thermal/thermal_zone3/trip_point_0_temp: 110000 /sys/devices/virtual/thermal/thermal_zone3/trip_point_0_type: critical immediately after the temp it raising, i hear the fan spin up. when i put "0" to /sys/devices/virtual/thermal/cooling_device2/cur_state the temp is droping and shortly after this, the fan stops. so, it seenms, that changing the value of /sys/devices/virtual/thermal/cooling_device2/cur_state doesn't trigger the fan directly, but raises somehow the temp of /sys/devices/virtual/thermal/thermal_zone3, which triggers the fan to spin up. at least it looks like this.
Question #2: does it prevent the spurious reboot?
i really don't know. the spurious reboot didn't happen since i installed /etc/sysconfig/lm_sensors and /usr/lib/systemd/system/lm_sensors.service. i have the KDE4 hardware monitoring widget running, which shows the temps of all 6 thermal_zones (called temp1 - temp6 in the widget). and i watch the current values the whole time to detect the temp, where it reboots, but it didn't happen since then. but i noticed, that forcing the fan to spin with the above methode (echo "1" > .....cooling_device2/cur_state) lowers the temp of thermal_zone5, starts the fan, BUT raises the temp of thermal_zone3. very strange. and as you can read in https://bugzilla.redhat.com/show_bug.cgi?id=895276 similar things happen with other HP notebooks from other people.... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c23 --- Comment #23 from Rainer Klier <rainer.klier@gmx.at> 2013-05-16 06:42:17 UTC --- it seems to be a bug in the kernel: https://bugzilla.kernel.org/show_bug.cgi?id=58311 https://bugzilla.kernel.org/show_bug.cgi?id=58301 https://bugzilla.kernel.org/show_bug.cgi?id=56591 https://bugzilla.kernel.org/show_bug.cgi?id=56281 https://bugzilla.kernel.org/show_bug.cgi?id=55241 and as you can see there, it mostly affects HP notebooks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c24 --- Comment #24 from Jean Delvare <jdelvare@suse.com> 2013-05-16 07:10:18 UTC --- These may not all be the same issue, as some users are reporting issues starting with kernel 3.7, which does work for you. Also most of the reports are about issues after resume, while your system fails even without a suspend/resume cycle. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c25 --- Comment #25 from Rainer Klier <rainer.klier@gmx.at> 2013-05-16 08:47:58 UTC --- (In reply to comment #24)
These may not all be the same issue, as some users are reporting issues starting with kernel 3.7, which does work for you. Also most of the reports are
yes, i remember that with 3.7 i can confirm that the fan ran the whole time (like many of these other bugs report), which in my case helped, because the notebook didn't overheat. but i think the source of all these bugs is the same. it results in fan running the whole time, or, in my case (apparently with a fix for that whole-time-running-issue) running not at all, or too late to cool the system.
about issues after resume, while your system fails even without a suspend/resume cycle.
yes. but https://bugzilla.kernel.org/show_bug.cgi?id=56281 is also about overheating. in this bug the reporter writes: "If ambient temperatures are above 30 degrees and the computer is slightly used the critical shutdown temperature of 95° for temp1 is frequently reached. This results in an immediate shutdown of the system." this sounds exactly like my problem. the strange thing is, that since i changed the bios setting "fan always on at AC" (which it is strangely NOT) and installung/using /etc/sysconfig/lm_sensors and /usr/lib/systemd/system/lm_sensors.service it didn't happen again. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c26 --- Comment #26 from Jean Delvare <jdelvare@suse.com> 2013-05-16 12:49:59 UTC --- You may want to check bug #820048, it is about a bug where the laptop fan is always on. It was fixed in 3.8 and 3.9 some times ago already, and Thomas has just backported it to our 12.3 kernel. Maybe the fix for that bug has its own unexpected side effects that is causing your trouble. Or maybe that bug was masking yet another but and you're only seeing its effects now. It could be interesting to check the behavior of kernel <= 3.6, for example with openSUSE 12.2. Anyway, this isn't my area and I won't be able to fix this bug myself. Just suggesting things to try to narrow the root cause... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c27 Stephan Kulow <coolo@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P1 - Urgent |P5 - None --- Comment #27 from Stephan Kulow <coolo@suse.com> 2013-09-27 16:59:16 CEST --- priority is to be set by the engineer, so don't play with it -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c28 --- Comment #28 from Thomas Renninger <trenn@suse.com> 2013-10-10 10:00:09 UTC --- Can you double check, please. Best also check with latest 13.1, this one will be out soon and as long as this is not the case, it's easier to change things. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c29 --- Comment #29 from Rainer Klier <rainer.klier@gmx.at> 2013-10-10 10:35:51 UTC --- (In reply to comment #28)
Can you double check, please.
in the meantime i am at kernel 3.11.4. and as i said in c25 it never happend again. /sys/devices/virtual/thermal/thermal_zone3 has 85° most of the time, and the fan is running most of the time. this is exactly how it was with kernel 3.7. the reboot never happened again. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c30 Borislav Petkov <bpetkov@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bpetkov@suse.com --- Comment #30 from Borislav Petkov <bpetkov@suse.com> 2014-01-16 16:12:17 UTC --- Rainer, can we close? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c31 --- Comment #31 from Rainer Klier <rainer.klier@gmx.at> 2014-01-17 07:16:01 UTC --- yes, thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807312 https://bugzilla.novell.com/show_bug.cgi?id=807312#c32 Jean Delvare <jdelvare@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |CLOSED InfoProvider|rainer.klier@gmx.at | Resolution| |FIXED --- Comment #32 from Jean Delvare <jdelvare@suse.com> 2014-01-17 07:49:56 UTC --- Closing per comment #31. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com