Mailinglist Archive: opensuse-bugs (14006 mails)

< Previous Next >
[Bug 376165] AMD Multicore - Lockup/ Reboot with MCE errors After nvidia kernel module load/install
  • From: bugzilla_noreply@xxxxxxxxxx
  • Date: Thu, 3 Apr 2008 11:25:09 -0600 (MDT)
  • Message-id: <20080403172509.D83B924538D@xxxxxxxxxxxxxxxxxxxxxx>
https://bugzilla.novell.com/show_bug.cgi?id=376165

User drankinatty@xxxxxxxxxxxxxxxxxx added comment
https://bugzilla.novell.com/show_bug.cgi?id=376165#c9





--- Comment #9 from David Rankin <drankinatty@xxxxxxxxxxxxxxxxxx> 2008-04-03
11:25:09 MST ---
(In reply to comment #7 from David Rankin)
Guys,

Here are additional mce(s) captured in syslog but absent in
/var/log/mcelog. The nvidia driver was *not* loaded when these occurred and
apparently were only written to syslog due to the crash/reboot occurring
before
the next cron run of /usr/sbin/mcelog. (currently set to run at 1 min.
intervals).

The following of the mce(s) caught in /var/log/messages:

Apr 1 01:35:01 nirvana /usr/sbin/cron[3761]: (root) CMD (/usr/sbin/mcelog
--k8
--syslog)
Apr 1 01:35:50 nirvana kernel: [ 198.079706] Machine check events logged
Apr 1 01:36:01 nirvana /usr/sbin/cron[3766]: (root) CMD (/usr/sbin/mcelog
--k8
--syslog)
Apr 1 01:36:01 nirvana mcelog: HARDWARE ERROR. This is *NOT* a software
problem!
Apr 1 01:36:01 nirvana mcelog: Please contact your hardware vendor
Apr 1 01:36:01 nirvana mcelog: CPU 0 1 instruction cache
Apr 1 01:36:01 nirvana mcelog: TSC 6f61bca83a
Apr 1 01:36:01 nirvana mcelog: ADDR 2b66e64040f0
Apr 1 01:36:01 nirvana mcelog:
Apr 1 01:36:01 nirvana mcelog: memory/cache error 'instruction fetch mem
transaction, instruction transaction, level 1'
Apr 1 01:36:01 nirvana mcelog: STATUS 9400000000000151 MCGSTATUS 0
Apr 1 01:36:01 nirvana mcelog: HARDWARE ERROR. This is *NOT* a software
problem!
Apr 1 01:36:01 nirvana mcelog: Please contact your hardware vendor
Apr 1 01:36:01 nirvana mcelog: CPU 1 1 instruction cache
Apr 1 01:36:01 nirvana mcelog: TSC 6f61bd690e
Apr 1 01:36:01 nirvana mcelog: ADDR ffff804454f0
Apr 1 01:36:01 nirvana mcelog:
Apr 1 01:36:01 nirvana mcelog: bit62 = error overflow (multiple
errors)
Apr 1 01:36:01 nirvana mcelog: memory/cache error 'instruction fetch mem
transaction, instruction transaction, level 1'
Apr 1 01:36:01 nirvana mcelog: STATUS d400000000000151 MCGSTATUS 0
Apr 1 01:37:01 nirvana /usr/sbin/cron[3805]: (root) CMD (/usr/sbin/mcelog
--k8
--syslog)
Apr 1 01:38:01 nirvana /usr/sbin/cron[3812]: (root) CMD (/usr/sbin/mcelog
--k8
--syslog)
Apr 1 01:39:01 nirvana /usr/sbin/cron[3820]: (root) CMD (/usr/sbin/mcelog
--k8
--syslog)
Apr 1 01:39:01 nirvana mcelog: HARDWARE ERROR. This is *NOT* a software
problem!
Apr 1 01:39:01 nirvana mcelog: Please contact your hardware vendor
Apr 1 01:39:01 nirvana mcelog: CPU 0 1 instruction cache
Apr 1 01:39:01 nirvana mcelog: TSC 9c52fbb5cd
Apr 1 01:39:01 nirvana mcelog: ADDR 77a8d270
Apr 1 01:39:01 nirvana mcelog:
Apr 1 01:39:01 nirvana mcelog: bit62 = error overflow (multiple
errors)
Apr 1 01:39:01 nirvana mcelog: memory/cache error 'instruction fetch mem
transaction, instruction transaction, level 1'
Apr 1 01:39:01 nirvana mcelog: STATUS d400000000000151 MCGSTATUS 0

The complete collection of the mce(s) from /var/log/messages are contained
in the attachment "mce_syslog" provided along with this post.

Let me know what else I can provide and I will respond as soon as I can.
Also, if an account on the box would be helpful, we can arrange that as well.

Thanks!


NOTE: For testing purposes the "acpi_use_timer_override" kernel parameter was
removed after the 4/3 reboot at 0400.


--
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

< Previous Next >
References