Mailinglist Archive: opensuse (2348 mails)

< Previous Next >
Re: [opensuse] Novell Bugzilla - At it Again - Bugs Apparently Dismissed Without Sufficient Investigation
  • From: "John Andersen" <jsamyth@xxxxxxxxx>
  • Date: Fri, 4 Apr 2008 11:22:06 -0700
  • Message-id: <60fb01490804041122k58f29d1ete9b95b08d79179cb@xxxxxxxxxxxxxx>
On Fri, Apr 4, 2008 at 9:20 AM, Anders Johansson <ajh@xxxxxxxxxx> wrote:

You seem to be misunderstanding what "mce" is. A machine check exception is
the hardware itself telling you that something has gone badly wrong. There is
no interpretation involved in the software. The software just logs the
message

If the mce says it is a hardware problem, you can count on its being a
hardware problem

Anders


No you can't count on that Anders.

Do some research on MCE errors and you will find these
errors are often reported when there is absolutely nothing wrong with the
machine. In fact DELL had a huge thread on their internal blog about the
reporting of mce errors from linux users upon the arrival of core 2
duo machines.
They were more than a little miffed getting calls because some developer of
the mce package with a swollen head put in language insisting it was hardware
when others clearly demonstrated you could get to that part of the code
with no hardware error at all.

Its quite possible for software bugs to hoze things so badly that the
mce modules think there was an error.

Further, part of the mce software's job is to filter out the bogus MCE errors.
(or so says someone who shall remain nameless but who's email
address is ak@xxxxxxx ). Now if the software's job is to filter out
bogus mc events that is a defacto assertion that lots of these events
are bogus.

I've seen these in the past as well. Mine had to do with runaway
keys, and the clue was the bit about TSC. Dual cores can get their
timers to disagree to the point that it forces a failure. You would often
see this with speed-step or power-now enabled, but simply locking
the machine at high-power setting would avoid the problem.
For me the nohpet command line kernel parameter was required under suse 10.1.
That solved all my instances. But that was on a core-2-duo.




--
----------JSA---------
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >