[opensuse-kernel] Recurring hardware error about cache on 12.1 kernel
Athlon X2 5600+ on nForce 500 chipset, Ubuntu and Win7 run fine, moved to SuSE
12.1 64bit recently, standard desktop-kernel.
Every once in a while it throws an error into messages that looks like this:
Jun 20 13:24:35 linux-3gig kernel: [ 2100.704050] [Hardware Error]:
MC0_STATUS[-|CE|-|-|AddrV|CECC]: 0x944ec00000000136
Jun 20 13:24:35 linux-3gig kernel: [ 2100.704070] [Hardware Error]: Data Cache
Error: during L1 linefill from L2.
Jun 20 13:24:35 linux-3gig kernel: [ 2100.704078] [Hardware Error]: cache
level: L2, tx: DATA, mem-tx: DRD
Jun 20 13:24:35 linux-3gig kernel: [ 2100.704097] [Hardware Error]: Machine
check events logged
What's strange is that it *only* happens on fractions of 50 seconds:
/var/log # grep -i cache\ error messages | cut -c1-68
Jun 20 13:19:35 linux-3gig kernel: [ 1800.701044] [Hardware Error]:
Jun 20 13:24:35 linux-3gig kernel: [ 2100.704070] [Hardware Error]:
Jun 20 13:37:05 linux-3gig kernel: [ 2850.704042] [Hardware Error]:
Jun 24 20:40:34 linux-3gig kernel: [21000.701044] [Hardware Error]:
Jun 25 20:36:40 linux-3gig kernel: [ 8100.704028] [Hardware Error]:
Jun 29 22:53:52 linux-3gig kernel: [ 1500.704022] [Hardware Error]:
Jun 29 23:16:22 linux-3gig kernel: [ 2850.704030] [Hardware Error]:
Jun 29 23:28:52 linux-3gig kernel: [ 3600.704065] [Hardware Error]:
Jun 29 23:46:22 linux-3gig kernel: [ 4650.704023] [Hardware Error]:
Jun 30 00:03:52 linux-3gig kernel: [ 5700.704028] [Hardware Error]:
Jun 30 00:11:22 linux-3gig kernel: [ 6150.704023] [Hardware Error]:
Jun 30 00:43:52 linux-3gig kernel: [ 8100.704023] [Hardware Error]:
Jul 1 16:15:06 linux-3gig kernel: [ 600.701025] [Hardware Error]:
Jul 1 17:55:06 linux-3gig kernel: [ 6600.704023] [Hardware Error]:
Jul 1 18:12:36 linux-3gig kernel: [ 7650.704037] [Hardware Error]:
Jul 2 22:47:25 linux-3gig kernel: [ 300.701022] [Hardware Error]:
Jul 2 22:52:25 linux-3gig kernel: [ 600.704029] [Hardware Error]:
Jul 6 00:41:11 linux-3gig kernel: [ 1500.701028] [Hardware Error]:
(Come to think of it this is not even regularly as then it would have to
happen on 00 and 30)
I would have thought hardware error, but not with the number scheme and in
respect that various kernels from ubuntu releases never complained.
Plus, the machine works perfectly stable from user perspective.
Here's # lspci -vvv
Regards,
Dex
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
Subsystem: ASUSTeK Computer Inc. Device 8239
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
El 06/07/12 19:07, Dexter Filmore escribió:
Athlon X2 5600+ on nForce 500 chipset, Ubuntu and Win7 run fine, moved to SuSE 12.1 64bit recently, standard desktop-kernel.
Every once in a while it throws an error into messages that looks like this:
Jun 20 13:24:35 linux-3gig kernel: [ 2100.704050] [Hardware Error]: MC0_STATUS[-|CE|-|-|AddrV|CECC]: 0x944ec00000000136 Jun 20 13:24:35 linux-3gig kernel: [ 2100.704070] [Hardware Error]: Data Cache Error: during L1 linefill from L2. Jun 20 13:24:35 linux-3gig kernel: [ 2100.704078] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD Jun 20 13:24:35 linux-3gig kernel: [ 2100.704097] [Hardware Error]: Machine check events logged
Either your machien really has problems or there is a bug somewhere, either in the kernel or in your BIOS/firmware etc. Try updating your BIOS from your motherboard vendor and running tests for your hardware. -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Am Saturday 07 July 2012 01:22:39 schrieb Cristian Rodríguez:
El 06/07/12 19:07, Dexter Filmore escribió:
Athlon X2 5600+ on nForce 500 chipset, Ubuntu and Win7 run fine, moved to SuSE 12.1 64bit recently, standard desktop-kernel.
Every once in a while it throws an error into messages that looks like this:
Jun 20 13:24:35 linux-3gig kernel: [ 2100.704050] [Hardware Error]: MC0_STATUS[-|CE|-|-|AddrV|CECC]: 0x944ec00000000136 Jun 20 13:24:35 linux-3gig kernel: [ 2100.704070] [Hardware Error]: Data Cache Error: during L1 linefill from L2. Jun 20 13:24:35 linux-3gig kernel: [ 2100.704078] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD Jun 20 13:24:35 linux-3gig kernel: [ 2100.704097] [Hardware Error]: Machine check events logged
Either your machien really has problems or there is a bug somewhere, either in the kernel or in your BIOS/firmware etc.
Try updating your BIOS from your motherboard vendor and running tests for your hardware.
Machine works flawless. Tests committed so far are RAM check and CPU load check. If you know a test suitable to provoke cpu cache swap errors let me know. cpuburn suite runs without fault. As all other OSses this machine ran so far (from Slackware 11 over debian 5, debian 6 and a queue of ubuntu flavors) none ever yielded such an error my prime suspect is the SuSE kernel. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.12 GCS d--(+)@ s-:+ a C++++ UL++ P+>++ L+++>++++ E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h>++ r* y? ------END GEEK CODE BLOCK------
On 07/07/2012 10:01 AM, Dexter Filmore wrote:
Machine works flawless. Tests committed so far are RAM check and CPU load check. If you know a test suitable to provoke cpu cache swap errors let me know. cpuburn suite runs without fault.
As all other OSses this machine ran so far (from Slackware 11 over debian 5, debian 6 and a queue of ubuntu flavors) none ever yielded such an error my prime suspect is the SuSE kernel.
The problem may be with the kernel; however, we won't really know until you provide the output of 'uname -r' for all the various distros that work OK, and for the one that does not. I have enough trouble keeping track of the kernel versions used in the openSUSE releases - I certainly do not know what Debian, Ubuntu, or Slackware use. One thing you could try is to boot one of the openSUSE 12.2 Beta 2 Live CDs to see if the problem was fixed between kernel 3.1 (as likely found in your 12.1), and the 3.4 kernel used in 12.2. Larry -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Am Saturday 07 July 2012 17:32:11 schrieb Larry Finger:
On 07/07/2012 10:01 AM, Dexter Filmore wrote:
Machine works flawless. Tests committed so far are RAM check and CPU load check. If you know a test suitable to provoke cpu cache swap errors let me know. cpuburn suite runs without fault.
As all other OSses this machine ran so far (from Slackware 11 over debian 5, debian 6 and a queue of ubuntu flavors) none ever yielded such an error my prime suspect is the SuSE kernel.
The problem may be with the kernel; however, we won't really know until you provide the output of 'uname -r' for all the various distros that work OK, and for the one that does not. I have enough trouble keeping track of the kernel versions used in the openSUSE releases - I certainly do not know what Debian, Ubuntu, or Slackware use.
One thing you could try is to boot one of the openSUSE 12.2 Beta 2 Live CDs to see if the problem was fixed between kernel 3.1 (as likely found in your 12.1), and the 3.4 kernel used in 12.2.
Larry
Right now SuSE runs 3.1.10-1.13-desktop, ran 3.1.0 when the installation was fresh (and threw this error then, too.) The last kernel I used on ubuntu was linux-image-3.2.0-25-generic, before that I ran a series of 2.6 kernels and fewer 3.0 kernels. Theses dists don't reside on disk anymore, so I can't tell in more detail, but one thing is worth mentioning: this is the first time ever I run the 3.1 kernel series. All other dists ran 3.0 or older *or* at least 3.2. So we're looking for something that happens every 50 seconds in the 3.1 SuSE desktop kernel series. Dex -- -----BEGIN GEEK CODE BLOCK----- Version: 3.12 GCS d--(+)@ s-:+ a C++++ UL++ P+>++ L+++>++++ E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h>++ r* y? ------END GEEK CODE BLOCK------
Larry
Right now SuSE runs 3.1.10-1.13-desktop, ran 3.1.0 when the installation was fresh (and threw this error then, too.)
Try to use the last official updated kernel for 12.1 which is 3.1.10-1.16-desktop Check why you didn't have the update automatically 1.13 was full of failure (power, suspend, etc) and was quickly removed from update channel -- Bruno Friedmann Ioda-Net Sàrl www.ioda-net.ch openSUSE Member & Ambassador GPG KEY : D5C9B751C4653227 irc: tigerfoot -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Am Sunday 08 July 2012 15:56:48 schrieb Bruno Friedmann:
Larry
Right now SuSE runs 3.1.10-1.13-desktop, ran 3.1.0 when the installation was fresh (and threw this error then, too.)
Try to use the last official updated kernel for 12.1 which is 3.1.10-1.16-desktop
Check why you didn't have the update automatically
1.13 was full of failure (power, suspend, etc) and was quickly removed from update channel
-- Bruno Friedmann Ioda-Net Sàrl www.ioda-net.ch
openSUSE Member & Ambassador GPG KEY : D5C9B751C4653227 irc: tigerfoot
That update escaped me, I wouldn't know how to make new updates visible here in SuSE's KDE3, so I update manually whenever I happen to remember it. But also with the new -1.16 kernel: [ 2700.704020] [Hardware Error]: MC2_STATUS[-|CE|-|-|AddrV|CECC]: 0x940040000000017a [ 2700.704025] [Hardware Error]: Bus Unit Error: EV error during data copyback. [ 2700.704028] [Hardware Error]: cache level: L2, tx: GEN, mem-tx: EV [ 2700.704035] [Hardware Error]: Machine check events logged -- -----BEGIN GEEK CODE BLOCK----- Version: 3.12 GCS d--(+)@ s-:+ a C++++ UL++ P+>++ L+++>++++ E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h>++ r* y? ------END GEEK CODE BLOCK------
participants (4)
-
Bruno Friedmann
-
Cristian Rodríguez
-
Dexter Filmore
-
Larry Finger