2015-05-05 22:08 GMT+02:00 Jean Delvare
Hi Thibaut,
Le Sunday 03 May 2015 à 00:33 +0200, Thibaut Verron a écrit :
I am a tumbleweed user, and I have recently (since 20150421 or 20150422) been having issues with my laptop's CPU overheating. Most of the times, the fan does not change speed when the temperature increases, and I cannot control its speed manually with pwm1_enable set to 1.
I reported it on the factory mailing list, and together we could rule out the possibility of a hardware failure (most likely dust on the fan). In particular, through some (seemingly) random changes, I could get the fan to work under manual control. This setting does not appear to be reliably persistent upon reboot, but it proves that the fan can still spin fast enough to cool down the CPU, when the OS has full control on its speed.
The setting I have to mess with to reactivate the fan when it stops working is the "thermal.off=1" parameter of the kernel. I would like to be more accurate, but I can't : in the past few days, I have "reactivated" the fan control twice, and as far as I can tell the procedure was not the same (and the same procedure did not yield the same consequences every time).
For reference, the factory thread is here: http://lists.opensuse.org/opensuse-factory/2015-04/msg00299.html
I read this thread in part, and here are a few random comments:
* It is common that low fan speeds are reported incorrectly, this isn't specific to your laptop. The reason is that fan speed control is achieved by either PWM signal (most frequent) or by lowering the voltage. In both cases the rotation feedback signal gets harder to sense, and weaker signal translates to incorrect speed values being reported. While inconvenient, it should in general not result in any issue with thermal management.
Ok. Anyway, this part of the problem is not new, but now there is a logical explanation for that too.
* If a recent kernel somehow messed up with your system, it is possible that the problem survives warm reboots, including switching to other operating systems or back to older kernels. I urge you to always _cold_ boot the machine before every test your perform. On laptops, cold booting may require unplugging the AC adapter AND removing the battery AND waiting for a couple minutes.
I did try that occasionally when "messing around with settings", but you have a good point, I'll make that a habit. Rebooting vs turning off and back on the computer (without unplugging it or waiting more than 10s) did yield different results sometimes (but still nothing reproducible).
* Your problems sound BIOS-related to me. In most BIOS there is an option to load setup or failsafe defaults. If you didn't try that yet, that would be worth trying.
I have tried restoring the bios to factory settings (the only thing my bios seems to offer on this matter) and it didn't help. Thank you, indeed I forgot to mention that on the other thread.
* If the previous advice doesn't help, it might be worth re-flashing the BIOS even if no new version is available. If the BIOS code was somehow corrupted, that would restore it.
I could try that, but I'm wary of taking this step: flashing a bios with a computer that may shutdown because of critical temperature sounds a bit dangerous. The first time I got issues of this kind, I did update the bios (with the laptop directly on an indoor air conditioner, I could do that because it was July), and it did not help. But since nothing seems to be reproducible here, I might as well try again now...
* When did you try manual fan speed control with pwm1_enable for the last time? If you normally don't need to do that, it is entirely possible that this has been broken for a longer time and you did not notice and this issue is unrelated with your current troubles. OTOH is the fan speed controller is hosed somehow, it wouldn't be all that surprising if that affects both manual and automatic modes.
You have a good point of course, I usually only try manual control when I have problems with automatic control, so last time must have been in 2013, and I was not using openSUSE back then. It is quite possible that manual control on my laptop has never been working reliably with the openSUSE install. But is it possible that manual control be broken without affecting automatic control? I guess that's asking whether the kernel has a lower-level access to the fan controls than what's exposed in the /sys filesystem?
FWIW there was no recent change to the eeepc-laptop driver.
As a closing note, I don't know how old your model is, but it should be noted that low-priced consumer hardware showing issues after 4-5 years is nothing out of the ordinary. This may not be what's happening here... but it may as well be "just" that.
That's right. But given that there was similar issues at age 1 (just past warranty expiration... a critical point in a computer's lifetime, I know), I would find it strange if this was a new issue. Of course, it is still possible that the problems I had 3 and 2 years ago were not merely software issues, but symptoms of a hardware failure that was (temporarily) mitigated with software tweaks. Thank you very much for your comments and suggestions. If nobody has any other short-term suggestion, I'll try redownloading and flashing the bios indeed. Thibaut -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org