Hi Folks, This motherboard has dual opteron machines and 4 memory banks per cpu and the memory bank is populated as 1GB dimm and 2GB dimm next. The error messages from syslog are the following. Is it possible to tell which dimm of cpu 1 is bad. physical access to the machine is not there. thank you, regards suse Jun 3 21:55:48 localhost kernel: CPU 1: Silent Northbridge MCE Jun 3 21:55:48 localhost kernel: Northbridge status d447c000e0080a13 Jun 3 21:55:48 localhost kernel: ECC syndrome bits e00f Jun 3 21:55:48 localhost kernel: extended error chipkill ecc error Jun 3 21:55:48 localhost kernel: link number 0 Jun 3 21:55:48 localhost kernel: corrected ecc error Jun 3 21:55:48 localhost kernel: error address valid Jun 3 21:55:48 localhost kernel: error enable Jun 3 21:55:48 localhost kernel: error overflow Jun 3 21:55:48 localhost kernel: previous error lost Jun 3 21:55:48 localhost kernel: error address 00000002359feb20 Jun 4 00:28:46 localhost kernel: CPU 1: Silent Northbridge MCE Jun 4 00:28:46 localhost kernel: Northbridge status d417c000ed080a13 Jun 4 00:28:46 localhost kernel: ECC syndrome bits ed2f Jun 4 00:28:46 localhost kernel: extended error chipkill ecc error Jun 4 00:28:46 localhost kernel: link number 0 Jun 4 00:28:46 localhost kernel: corrected ecc error Jun 4 00:28:46 localhost kernel: error address valid Jun 4 00:28:46 localhost kernel: error enable Jun 4 00:28:46 localhost kernel: error overflow Jun 4 00:28:46 localhost kernel: previous error lost Jun 4 00:28:46 localhost kernel: error address 000000024ef3fd20 Jun 4 01:49:17 localhost kernel: CPU 1: Silent Northbridge MCE Jun 4 01:49:17 localhost kernel: Northbridge status d447c000e0080a13 Jun 4 01:49:17 localhost kernel: ECC syndrome bits e00f Jun 4 01:49:17 localhost kernel: extended error chipkill ecc error Jun 4 01:49:17 localhost kernel: link number 0 Jun 4 01:49:17 localhost kernel: corrected ecc error Jun 4 01:49:17 localhost kernel: error address valid Jun 4 01:49:17 localhost kernel: error enable Jun 4 01:49:17 localhost kernel: error overflow Jun 4 01:49:17 localhost kernel: previous error lost Jun 4 01:49:17 localhost kernel: error address 000000027917fba0 Jun 4 10:44:16 localhost kernel: CPU 1: Silent Northbridge MCE Jun 4 10:44:16 localhost kernel: Northbridge status d447c000e0080a13 Jun 4 10:44:16 localhost kernel: ECC syndrome bits e00f Jun 4 10:44:16 localhost kernel: extended error chipkill ecc error Jun 4 10:44:16 localhost kernel: link number 0 Jun 4 10:44:16 localhost kernel: corrected ecc error Jun 4 10:44:16 localhost kernel: error address valid Jun 4 10:44:16 localhost kernel: error enable Jun 4 10:44:16 localhost kernel: error overflow Jun 4 10:44:16 localhost kernel: previous error lost Jun 4 10:44:16 localhost kernel: error address 000000020dc7e8e0 Jun 6 10:38:11 localhost syslogd 1.4.1: restart. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Mon, Jun 06, 2005 at 02:28:33PM -0700, suse amd wrote:
Hi Folks,
This motherboard has dual opteron machines and 4 memory banks per cpu and the memory bank is populated as 1GB dimm and 2GB dimm next.
The error messages from syslog are the following. Is it possible to tell which dimm of cpu 1 is bad. physical access to the machine is not there.
Unfortunately not, the kernel doesnt have the necessary information. All you can try is to remove one and see if the problem goes away. -Andi
participants (2)
-
Andi Kleen
-
suse amd