[Bug 759820] New: Hardware Error and freezes if power save mode C6 activated
https://bugzilla.novell.com/show_bug.cgi?id=759820 https://bugzilla.novell.com/show_bug.cgi?id=759820#c0 Summary: Hardware Error and freezes if power save mode C6 activated Classification: openSUSE Product: openSUSE 12.2 Version: Factory Platform: x86-64 OS/Version: openSUSE 12.2 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: ath@muffti.de QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/12.0 Two weeks ago I bought a new board, Proz and RAM. ASUS Crosshair Formula V, BIOS 1301 (current) AMD FX-8150 32GB DDR3-1333 RAM Before there the system run with the following components: OSS11.4 MSI K8N2 Diamond FX940 8GB DDR2 RAM After the upgrade I had sporadic system freezes (only Reset helped, no kernel panic!). In the messages.log I found some errors: Apr 15 15:12:16 achim-linux kernel: [ 3900.695217] [Hardware Error]: No human readable MCE decoding support on this CPU type. Apr 15 15:12:16 achim-linux kernel: [ 3900.695232] [Hardware Error]: Run the message through 'mcelog --ascii' to decode. Apr 15 15:12:16 achim-linux kernel: [ 3900.695243] [Hardware Error]: Machine check events logged Apr 15 15:12:16 achim-linux kernel: [ 3900.698139] [Hardware Error]: No human readable MCE decoding support on this CPU type. Apr 15 15:12:16 achim-linux kernel: [ 3900.698157] [Hardware Error]: Run the message through 'mcelog --ascii' to decode. Apr 15 15:12:16 achim-linux kernel: [ 3900.698168] [Hardware Error]: Machine check events logged Now I done a fresh installation of OSS12.2MS3 in a extra partition and found the following errors in messages.log: Apr 28 17:44:53 achim-linux kernel: [ 300.695084] [Hardware Error]: CPU:3 MC2_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdc10c04001040136 Apr 28 17:44:53 achim-linux kernel: [ 300.698425] [Hardware Error]: MC2_ADDR: 0x000000081e7265a0 Apr 28 17:44:53 achim-linux kernel: [ 300.701772] [Hardware Error]: Combined Unit Error: Fill ECC error on data fills. Apr 28 17:44:53 achim-linux kernel: [ 300.705150] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD Mostly CPU2 was affected, sometimes CPU3 I have limited the problem so far as possible. If the Vanilla-Kernel was running only error messages (sporadic) come without a system freezes. If the Desktop-Kernel was running, it comes error messages (often) + system freezes. Furthermore the problem occurs only if the power save mode C6 was activated in the BIOS. If I activate only C1E then there comes no errors. Reproducible: Sometimes -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=759820 https://bugzilla.novell.com/show_bug.cgi?id=759820#c Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jeffm@suse.com AssignedTo|kernel-maintainers@forge.pr |trenn@novell.com |ovo.novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=759820 https://bugzilla.novell.com/show_bug.cgi?id=759820#c1 Thomas Renninger <trenn@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #1 from Thomas Renninger <trenn@suse.com> 2012-09-12 15:32:41 UTC --- This sounds like a platform issue. I would search for BIOS update first. If the problem persists, please make sure mcelog package is installed and the mcelog daemon is running. Please attach /proc/cpuinfo, /var/log/mcelog (filled with some MCEs I expect) and dmesg output. Are there any edac drivers loaded on this CPU (lsmod |grep -i edac)? If yes, please try without them (/etc/modprobe.d/99-local.conf): blacklist edac_module1 You possibly have to blacklist more than one module/driver. Double check whether there are dependencies by making sure no edac driver is loaded: lsmod |grep -i edac -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=759820 https://bugzilla.novell.com/show_bug.cgi?id=759820#c Thomas Renninger <trenn@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO InfoProvider| |ath@muffti.de -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Furthermore the problem occurs only if the power save mode C6 was activated in the BIOS. to Intel. If they find out more, it might be necessary to blacklist the CPU or
https://bugzilla.novell.com/show_bug.cgi?id=759820 https://bugzilla.novell.com/show_bug.cgi?id=759820#c2 Thomas Renninger <trenn@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED InfoProvider|ath@muffti.de | Resolution| |INVALID --- Comment #2 from Thomas Renninger <trenn@suse.com> 2012-10-10 09:49:05 UTC --- I doubt we can do much. It would be great if you can post the output of one processor of: cat /proc/cpuinfo I would like to forward this info together with: they might have an idea which CPU setting (wrongly initialized by BIOS) could be wrong. I am setting the bug to invalid as we (from OS perspective) cannot do much about this, beside blacklisting bad HW/BIOSes. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com