Re: [opensuse] Hardware error messages...
On Wednesday, 15 March 2017 22:12:19 EET Carlos E. R. wrote:
On 2017-03-15 19:03, auxsvr wrote:
These messages indicate that your CPU is overheating. Since it takes a few ms to cool down, most likely turbo boost triggers overheating in your case, just as in mine, and thermald should handle this, although it is not always successful.
No, thermal cooling can not work that fast.
Latency of Intel processors when switching P-states is up to several hundreds of μs, why cannot cooling work that fast? -- Regards, Peter -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, 15 Mar 2017 23:01, auxsvr wrote:
On Wednesday, 15 March 2017 22:12:19 EET Carlos E. R. wrote:
On 2017-03-15 19:03, auxsvr wrote:
These messages indicate that your CPU is overheating. Since it takes a few ms to cool down, most likely turbo boost triggers overheating in your case, just as in mine, and thermald should handle this, although it is not always successful.
No, thermal cooling can not work that fast.
Latency of Intel processors when switching P-states is up to several hundreds of μs, why cannot cooling work that fast?
Simple: Physics. Thermophysics to be precise. The CPU-die and caseing only allows a certain outflow rate of energie per time. This is called "thermal latency" and works togetter with the better known "thermal resistance". A show case: Your CPU goes from 25W low load to full peak at 100W electrical power with Turbo-Boost for about 15 seconds befor the die gets so hot that the governer disables the boost and limits the power intake to about 65W. So, how much heat is in the CPU? The low load gives the die a temp about 43°C / 110°F, at the end of the boost, the die reaches 80°C / 175°F. Some math: 100W over 15 seconds gives 1500 Ws of power but garantied transport from the die to the outside of the case only is 975 Ws for the same time. The difference is what heats up the die faster than it "should" by thermal design. This "thermal reserve" is limited. In the end it comes down to: 1 second boost need 2 second cooldown afterward. But due to thermal latency a thermal spike is delayed in relation to the electrical power spike. This makes the "thermal governing" a very difficult to formulate thing. A lot of "expiriance" and "gut feeling" is needed to get it right. A needed "must have" for a stable thermal regulation is a stable and exact controlable voltage source for the power. If the PSU isn't that good, and the power rails and VRC's on the board can't compensate or are not that good themselfs, your governing is bound to be ugly. What does Microsoft? "Stability" is the rule, so limit down the frequency of the CPU goes up to regular, and disable boost is the first thing the WinNT kernel does. If this stops the MCE's from coming, good, lets keep that. Linux is a little more on the "power to the user" side. YOU have to set the governer type and what kind of rules it should follow. MCE happening? - Lets the the user! He is the boss! There you go. IMHO, lower the frequency of the CPU, and look if the MCE still happens. If so your PSU and or board power rails are suspect. - Yamaban.
participants (2)
-
auxsvr
-
Yamaban