Re: [opensuse] Odd kernel MCE errors - OpenSuse 11.4
I upgraded to the latest kernel for 11.4, from within Yast, and after rebooting I get these MCE kernel errors.
Apr 26 16:29:40 fmt-itops kernel: [81228.339471] [Hardware Error]: No human readable MCE decoding support on this CPU type. Apr 26 16:29:40 fmt-itops kernel: [81228.339479] [Hardware Error]: Run the message through 'mcelog --ascii' to decode. <snip>
The kernel running is;
2.6.37.6-0.11-desktop #1 SMP PREEMPT 2011-12-19 23:39:38 +0100 x86_64 x86_64 x86_64 GNU/Linux
...and the CPU type; CPU model name = Intel(R) Xeon(R) CPU - X5650 @ 2.67GHz
What can I do to prevent these errors?
Many thanks,
James
I encountered this a few years ago after an upgrade. My memory is very fuzzy, so don't hold me to this: If you are just getting the errors for the first time, it might be the kernel. IIRC the kernel code has some explicit handling for certain boards known to be problematic with MCE, and there may have been some other peculiarities with this code. While IIRC the mcelog package can be removed, I think there may also be a kernel argument that disables checking ("nomce"). I'm not suggesting you just dismiss the error msgs; they could be valid so are worth following up on. What I did though was install a newer kernel and the errors immediately ceased. Since then they haven't reappeared with all the subsequent kernel upgrades. -- Thank you for the illuminating reply. Much appreciated. I should have made it clearer that he errors did not occur until immediately after the kernel upgrade. The kernel running, 2.6.37.6-0.11-desktop, is the most recent provided by Yast in opesuse 11.4. When I run mcelog I get; #mcelog --ascii < /var/log/mcelog mcelog: failed to prefill DIMM database from DMI data mcelog: failed to prefill DIMM database from DMI data mcelog: failed to prefill DIMM database from DMI data mcelog: failed to prefill DIMM database from DMI data <snip> I get the same if I cat the file. Not sure what to do next. Any suggestions? Thanks again, James Check the following links. I seem to recall that there are issues with some bios's. Also given the error msg, it may be that mcelog itself that is the culprit. http://www.mcelog.org/faq.html#9 http://www.mcelog.org/bios-support.html https://bugzilla.novell.com/show_bug.cgi?id=623248 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
<snip> Check the following links. I seem to recall that there are issues with some bios's. Also given the error msg, it may be that mcelog itself that is the culprit. ~~~~~~~~~~~~~~~~~~~~~~ Thank you. On a similar note, how can I have these errors *not* write to the terminal, but instead to a file? I'd rather not see the following when I am editing a file; Message from syslogd@fmt at Apr 27 16:05:31 ... kernel:[165960.993509] [Hardware Error]: No human readable MCE decoding support on this CPU type. Message from syslogd@fmt at Apr 27 16:05:31 ... kernel:[165960.993518] [Hardware Error]: Run the message through 'mcelog --ascii' to decode. Message from syslogd@fmt at Apr 27 16:08:42 ... kernel:[166151.444124] [Hardware Error]: No human readable MCE decoding support on this CPU type. Message from syslogd@fmt at Apr 27 16:08:42 ... kernel:[166151.444132] [Hardware Error]: Run the message through 'mcelog --ascii' to decode. <snip> Would starting the mclog fix that problem? Currently it is not running; # /etc/init.d/mcelog status Checking for service mcelog... unused Thanks again, James -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Friday, April 27, 2012 07:39 PM James D. Parra wrote:
<snip> Check the following links. I seem to recall that there are issues with some bios's. Also given the error msg, it may be that mcelog itself that is the culprit.
~~~~~~~~~~~~~~~~~~~~~~
Thank you. On a similar note, how can I have these errors *not* write to the terminal, but instead to a file? I'd rather not see the following when I am editing a file;
Message from syslogd@fmt at Apr 27 16:05:31 ... kernel:[165960.993509] [Hardware Error]: No human readable MCE decoding support on this CPU type.
Message from syslogd@fmt at Apr 27 16:05:31 ... kernel:[165960.993518] [Hardware Error]: Run the message through 'mcelog --ascii' to decode.
Message from syslogd@fmt at Apr 27 16:08:42 ... kernel:[166151.444124] [Hardware Error]: No human readable MCE decoding support on this CPU type.
Message from syslogd@fmt at Apr 27 16:08:42 ... kernel:[166151.444132] [Hardware Error]: Run the message through 'mcelog --ascii' to decode. <snip>
Would starting the mclog fix that problem? Currently it is not running;
# /etc/init.d/mcelog status Checking for service mcelog... unused
Thanks again,
James
Suggest you read the man page. The MCE error checking is being done by the kernel. The kernel writes MCE error messages to the buffer attached to /dev/mcelog, mcelog just retrieves those errors and interprets/reports them. Consequently, it's possible to get erroneous or misleading messages written by the kernel, this can be due to a problem in the kernel or more likely a flaw in the bios code. It's also possible for there to be an issue with MCE interpreting a kernel message, possibly traceable back to the bios again (both described in the link I posted before). If you determine that there is an issue with a kernel version and you want MCE checking, update the kernel. If you determine that there is an issue with the bios, disable MCE checking with the nomce kernel argument. If you want kernel checking and all messages written to the mcelog (i.e., divert the msgs above), you need to run the mcelog daemon (YaST runlevels). -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (2)
-
Dennis Gallien
-
James D. Parra