http://bugzilla.opensuse.org/show_bug.cgi?id=1028027
http://bugzilla.opensuse.org/show_bug.cgi?id=1028027#c3
--- Comment #3 from Gerald Hofer ---
I am running the laptop in quite an hot environment (it is summer in
Australia). So I am actually getting thermal events. I think that is related to
the mce events I see.
The kernel logs this into the syslog:
2017-03-08T06:49:30.983579+10:00 gerald6 kernel: [ 14.445946] CPU0: Core
temperature above threshold, cpu clock throttled (total events = 1)
2017-03-08T06:49:30.983600+10:00 gerald6 kernel: [ 14.445947] CPU1: Core
temperature above threshold, cpu clock throttled (total events = 1)
2017-03-08T06:49:30.983602+10:00 gerald6 kernel: [ 14.445948] CPU2: Package
temperature above threshold, cpu clock throttled (total events = 1)
2017-03-08T06:49:30.983604+10:00 gerald6 kernel: [ 14.445950] CPU7: Package
temperature above threshold, cpu clock throttled (total events = 1)
2017-03-08T06:49:30.983606+10:00 gerald6 kernel: [ 14.445951] CPU6: Package
temperature above threshold, cpu clock throttled (total events = 1)
2017-03-08T06:49:30.983607+10:00 gerald6 kernel: [ 14.445952] CPU5: Package
temperature above threshold, cpu clock throttled (total events = 1)
2017-03-08T06:49:30.983608+10:00 gerald6 kernel: [ 14.445953] CPU4: Package
temperature above threshold, cpu clock throttled (total events = 1)
2017-03-08T06:49:30.983610+10:00 gerald6 kernel: [ 14.445954] CPU3: Package
temperature above threshold, cpu clock throttled (total events = 1)
2017-03-08T06:49:30.983611+10:00 gerald6 kernel: [ 14.445955] CPU1: Package
temperature above threshold, cpu clock throttled (total events = 1)
2017-03-08T06:49:30.983612+10:00 gerald6 kernel: [ 14.445957] CPU0: Package
temperature above threshold, cpu clock throttled (total events = 1)
2017-03-08T06:49:30.983614+10:00 gerald6 kernel: [ 14.445960] mce: [Hardware
Error]: Machine check events logged
2017-03-08T06:49:30.983615+10:00 gerald6 kernel: [ 14.445973] mce: [Hardware
Error]: CPU 1: Machine Check: 0 Bank 128: 0000000088030003
2017-03-08T06:49:30.983616+10:00 gerald6 kernel: [ 14.445980] mce: [Hardware
Error]: TSC 145e6189fc
2017-03-08T06:49:30.983623+10:00 gerald6 kernel: [ 14.445986] mce: [Hardware
Error]: PROCESSOR 0:306a9 TIME 1488919770 SOCKET 0 APIC 1 microcode 1c
2017-03-08T06:49:30.983626+10:00 gerald6 kernel: [ 14.445991] mce: [Hardware
Error]: CPU 0: Machine Check: 0 Bank 128: 0000000088030003
2017-03-08T06:49:30.983627+10:00 gerald6 kernel: [ 14.445995] mce: [Hardware
Error]: TSC 145e61dddb
2017-03-08T06:49:30.983628+10:00 gerald6 kernel: [ 14.445998] mce: [Hardware
Error]: PROCESSOR 0:306a9 TIME 1488919770 SOCKET 0 APIC 0 microcode 1c
2017-03-08T06:49:30.984545+10:00 gerald6 kernel: [ 14.446934] CPU0: Core
temperature/speed normal
2017-03-08T06:49:30.984554+10:00 gerald6 kernel: [ 14.446935] CPU1: Core
temperature/speed normal
2017-03-08T06:49:30.984556+10:00 gerald6 kernel: [ 14.446936] CPU5: Package
temperature/speed normal
2017-03-08T06:49:30.984557+10:00 gerald6 kernel: [ 14.446937] CPU6: Package
temperature/speed normal
2017-03-08T06:49:30.984559+10:00 gerald6 kernel: [ 14.446938] CPU3: Package
temperature/speed normal
2017-03-08T06:49:30.984560+10:00 gerald6 kernel: [ 14.446939] CPU2: Package
temperature/speed normal
2017-03-08T06:49:30.984561+10:00 gerald6 kernel: [ 14.446939] CPU7: Package
temperature/speed normal
2017-03-08T06:49:30.984562+10:00 gerald6 kernel: [ 14.446940] CPU4: Package
temperature/speed normal
2017-03-08T06:49:30.984563+10:00 gerald6 kernel: [ 14.446941] CPU1: Package
temperature/speed normal
2017-03-08T06:49:30.984564+10:00 gerald6 kernel: [ 14.446942] CPU0: Package
temperature/speed normal
2017-03-08T06:49:30.984565+10:00 gerald6 kernel: [ 14.446942] mce: [Hardware
Error]: Machine check events logged
2017-03-08T06:49:30.984567+10:00 gerald6 kernel: [ 14.446949] mce: [Hardware
Error]: CPU 1: Machine Check: 0 Bank 128: 0000000088040002
2017-03-08T06:49:30.984571+10:00 gerald6 kernel: [ 14.446963] mce: [Hardware
Error]: TSC 145e8a230a
2017-03-08T06:49:30.984572+10:00 gerald6 kernel: [ 14.446977] mce: [Hardware
Error]: PROCESSOR 0:306a9 TIME 1488919770 SOCKET 0 APIC 1 microcode 1c
2017-03-08T06:49:30.988535+10:00 gerald6 kernel: [ 14.446991] mce: [Hardware
Error]: CPU 0: Machine Check: 0 Bank 128: 0000000088040002
2017-03-08T06:49:30.988551+10:00 gerald6 kernel: [ 14.447003] mce: [Hardware
Error]: TSC 145e8a64ef
2017-03-08T06:49:30.988554+10:00 gerald6 kernel: [ 14.447016] mce: [Hardware
Error]: PROCESSOR 0:306a9 TIME 1488919770 SOCKET 0 APIC 0 microcode 1c
2017-03-08T06:49:31.036530+10:00 gerald6 kernel: [ 14.497265] fuse init (API
version 7.26)
These messages are then also logged by mcelog into the syslog.
2017-03-08T06:58:52.876629+10:00 gerald6 mcelog[1474]: Hardware event. This is
not a software error.
2017-03-08T06:58:52.877631+10:00 gerald6 mcelog[1474]: MCE 0
2017-03-08T06:58:52.877848+10:00 gerald6 mcelog[1474]: CPU 1 THERMAL EVENT TSC
175008646a4
2017-03-08T06:58:52.878024+10:00 gerald6 mcelog[1474]: TIME 1488920332 Wed Mar
8 06:58:52 2017
2017-03-08T06:58:52.878174+10:00 gerald6 mcelog[1474]: Processor 1 heated above
trip temperature. Throttling enabled.
2017-03-08T06:58:52.878315+10:00 gerald6 mcelog[1474]: Please check your system
cooling. Performance will be impacted
2017-03-08T06:58:52.878461+10:00 gerald6 mcelog[1474]: Running trigger
`unknown-error-trigger'
2017-03-08T06:58:52.878605+10:00 gerald6 mcelog[1474]: STATUS 88030003
MCGSTATUS 0
2017-03-08T06:58:52.878749+10:00 gerald6 mcelog[1474]: MCGCAP c09 APICID 1
SOCKETID 0
2017-03-08T06:58:52.878893+10:00 gerald6 mcelog[1474]: CPUID Vendor Intel
Family 6 Model 58
2017-03-08T06:58:52.879029+10:00 gerald6 mcelog[1474]: Hardware event. This is
not a software error.
2017-03-08T06:58:52.879154+10:00 gerald6 mcelog[1474]: MCE 1
2017-03-08T06:58:52.879286+10:00 gerald6 mcelog[1474]: CPU 0 THERMAL EVENT TSC
1750086a0e0
2017-03-08T06:58:52.879419+10:00 gerald6 mcelog[1474]: TIME 1488920332 Wed Mar
8 06:58:52 2017
2017-03-08T06:58:52.879543+10:00 gerald6 mcelog[1474]: Processor 0 heated above
trip temperature. Throttling enabled.
2017-03-08T06:58:52.879665+10:00 gerald6 mcelog[1474]: Please check your system
cooling. Performance will be impacted
2017-03-08T06:58:52.879773+10:00 gerald6 mcelog[1474]: Running trigger
`unknown-error-trigger'
2017-03-08T06:58:52.879852+10:00 gerald6 mcelog[1474]: STATUS 88030003
MCGSTATUS 0
2017-03-08T06:58:52.879929+10:00 gerald6 mcelog[1474]: MCGCAP c09 APICID 0
SOCKETID 0
2017-03-08T06:58:52.880010+10:00 gerald6 mcelog[1474]: CPUID Vendor Intel
Family 6 Model 58
2017-03-08T06:58:52.880150+10:00 gerald6 mcelog[1474]: mcelog: warning: 24
bytes ignored in each record
2017-03-08T06:58:52.880278+10:00 gerald6 mcelog[1474]: mcelog: consider an
update
2017-03-08T06:58:52.880417+10:00 gerald6 mcelog: CPU 0 on socket 0 received
unknown error
2017-03-08T06:58:52.880503+10:00 gerald6 mcelog[1474]: <27>Mar 8 06:58:52
mcelog: CPU 0 on socket 0 received unknown error
2017-03-08T06:58:52.880642+10:00 gerald6 mcelog: Location: CPU 0 on socket 0
2017-03-08T06:58:52.880709+10:00 gerald6 mcelog[1474]: <27>Mar 8 06:58:52
mcelog: Location: CPU 0 on socket 0
2017-03-08T06:58:52.880835+10:00 gerald6 mcelog: CPU 1 on socket 0 received
unknown error
2017-03-08T06:58:52.880902+10:00 gerald6 mcelog[1474]: <27>Mar 8 06:58:52
mcelog: CPU 1 on socket 0 received unknown error
2017-03-08T06:58:52.881037+10:00 gerald6 mcelog: Location: CPU 1 on socket 0
2017-03-08T06:58:52.881114+10:00 gerald6 mcelog[1474]: mcelog: Too many trigger
children running already
2017-03-08T06:58:52.881241+10:00 gerald6 mcelog[1474]: Hardware event. This is
not a software error.
2017-03-08T06:58:52.881363+10:00 gerald6 mcelog[1474]: MCE 0
2017-03-08T06:58:52.881483+10:00 gerald6 mcelog[1474]: CPU 0 THERMAL EVENT TSC
17500aeaa3f
2017-03-08T06:58:52.881608+10:00 gerald6 mcelog[1474]: TIME 1488920332 Wed Mar
8 06:58:52 2017
2017-03-08T06:58:52.881733+10:00 gerald6 mcelog[1474]: Processor 0 below trip
temperature. Throttling disabled
2017-03-08T06:58:52.881854+10:00 gerald6 mcelog[1474]: Running trigger
`unknown-error-trigger'
2017-03-08T06:58:52.881978+10:00 gerald6 mcelog[1474]: STATUS 88040002
MCGSTATUS 0
2017-03-08T06:58:52.882105+10:00 gerald6 mcelog[1474]: MCGCAP c09 APICID 0
SOCKETID 0
2017-03-08T06:58:52.882242+10:00 gerald6 mcelog[1474]: CPUID Vendor Intel
Family 6 Model 58
2017-03-08T06:58:52.882371+10:00 gerald6 mcelog[1474]: mcelog: Too many trigger
children running already
2017-03-08T06:58:52.882503+10:00 gerald6 mcelog[1474]: Hardware event. This is
not a software error.
2017-03-08T06:58:52.882632+10:00 gerald6 mcelog[1474]: MCE 1
2017-03-08T06:58:52.882764+10:00 gerald6 mcelog[1474]: CPU 1 THERMAL EVENT TSC
17500aee64b
2017-03-08T06:58:52.882900+10:00 gerald6 mcelog[1474]: TIME 1488920332 Wed Mar
8 06:58:52 2017
2017-03-08T06:58:52.883024+10:00 gerald6 mcelog[1474]: Processor 1 below trip
temperature. Throttling disabled
2017-03-08T06:58:52.883154+10:00 gerald6 mcelog[1474]: Running trigger
`unknown-error-trigger'
2017-03-08T06:58:52.883278+10:00 gerald6 mcelog[1474]: STATUS 88040002
MCGSTATUS 0
2017-03-08T06:58:52.883407+10:00 gerald6 mcelog[1474]: MCGCAP c09 APICID 1
SOCKETID 0
2017-03-08T06:58:52.883544+10:00 gerald6 mcelog[1474]: CPUID Vendor Intel
Family 6 Model 58
2017-03-08T06:58:52.883668+10:00 gerald6 mcelog[1474]: mcelog: warning: 24
bytes ignored in each record
2017-03-08T06:58:52.883793+10:00 gerald6 mcelog[1474]: mcelog: consider an
update
2017-03-08T06:58:52.883934+10:00 gerald6 mcelog[1474]: <27>Mar 8 06:58:52
mcelog: Location: CPU 1 on socket 0
I noticed that mcelog seemed to be outdated - consider these two messages:
mcelog: warning: 24 bytes ignored in each record
mcelog: consider an update
I am using following version of mcelog that is part of tumbleweed:
mcelog-1.47-1.1.x86_64
I have now updated mcelog from the latest git:
git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git
gerald6:~ # mcelog --version
mcelog v148
I am not seeing the "consider an update" message any more. I am now waiting for
the day to warm up to trigger more thermal events and see if that fixes it.
--
You are receiving this mail because:
You are on the CC list for the bug.