Mailinglist Archive: opensuse (2348 mails)

< Previous Next >
Re: [opensuse] Novell Bugzilla - At it Again - Bugs Apparently Dismissed Without Sufficient Investigation
  • From: "David C. Rankin" <drankinatty@xxxxxxxxxxxxxxxxxx>
  • Date: Mon, 07 Apr 2008 10:30:43 -0500
  • Message-id: <47FA3E23.5070703@xxxxxxxxxxxxxxxxxx>
Dave Plater wrote:
David C. Rankin wrote:

Thank you Dave,

I'm pulling my hair out on this one. One thing I haven't done is to post the actual MCEs I'm seeing. The mcelog and the syslog containing the MCEs are here:

http://www.3111skyline.com/download/lockup_x86-64/mcelog

http://www.3111skyline.com/download/lockup_x86-64/messages_20080407

The logs must both be read due to MCEs being written to "mcelog" before I configured mcelog to write to /var/log/messages. Also, the data before approximately 3/31/08 just shows the fact that an MCE occurred but doesn't give the supporting details. (it's just included for completeness) This is because I didn't have mcelog installed until then. Additionally some of the 4/7 entries do not have details because I dorked the mcelog cron entry during an edit. It is fixed now. The mcelog is annotated with:

#
#### nvidia 8600GT removed, driver blacklisted, ATI Radeon 1500 installed w/radeon driver
#

To show when the nvidia card was changed and the nvidia kernel module removed. (Frequency of hardlocks in reduced but MCEs still reported and do occasionally hardlock)

grepping the files on ADDR | sort shows that the errors never occur at the same memory address. I really don't know what the ADDR means. I've been looking for some way to correlate "ADDR 2ba96974e8a0" for example to what that means (video, main memory, bios, etc..) No luck so far.

If somebody smarter than I can tell what memory range we are dealing with, and hopefully what it means, it would be greatly appreciated.

Thanks!

P.S. - The complete collection of the AMD Technical Documentation for x86_64 and Opteron are also included in:

http://www.3111skyline.com/download/lockup_x86-64

if anyone is curious...

Hi David, does your bios support disabling video interrupts?
If it does, disable it and try again. It's a pity mcelog is not a bit more specific about the actual instruction executed at time of exception.
Regards
Dave


I'll take a look! In the mean time, I have the mcelog (from syslog) that I posted (http://www.3111skyline.com/download/lockup_x86-64/messages) now update automatically on the hour with:

10:25 nirvana~> cat linux/scripts/mcesyslog
#!/bin/bash
#
## Retrieve MCE from syslog and write to
## /srv/www/download/lockup_x86-64/messages
#

OUTFILE="/srv/www/download/lockup_x86-64/messages"

sudo grep mcelog: /var/log/messages | \
sed '/.*david.*/d' | \
sed -e 's/\(.*\)STATUS\(.*\)/\1STATUS\2\n/' > "$OUTFILE"

exit 0

I'll look at the video interrupt option at lunch.

Thanks for your continued interest as we "endeavor to perceiver" or "go forth in our effort of futility", whichever the case my be ;-)

--
David C. Rankin, J.D., P.E.
Rankin Law Firm, PLLC
510 Ochiltree Street
Nacogdoches, Texas 75961
Telephone: (936) 715-9333
Facsimile: (936) 715-9339
www.rankinlawfirm.com
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >