Mailinglist Archive: opensuse-amd64 (470 mails)

< Previous Next >
Re: [suse-amd64] lost interrupts problem
  • From: rkimber@xxxxxxxxxxxx
  • Date: Thu, 25 Nov 2004 16:50:09 +0000 (UTC)
  • Message-id: <20041125165002.4b8689dd.rkimber@xxxxxxxxxxxx>
On Thu, 25 Nov 2004 14:12:25 +0100
Andi Kleen <ak@xxxxxxx> wrote:

> > Another point I meant to emphasise is that these errors are recent.
> > In the preceding six months they had not occurred at all. This
> > suggests some change in either harware or, more likely, software.
> > My guess is that there is a bug in some recently produced update.
>
> More likely hardware actually. Lost interrupt means that the hard disk
> didn't reply to a command in time. I would check the SMART statistics
> using smartctl and your cables.

In fact, both disks passed the 'smartctl -a' test when run from the
command line.

The documentation on SMART doesn't always help in the
interpretation of output, though.

For example, for the disk that does NOT lose interrupts smartd has
returned:-

/dev/hde, SMART Prefailure Attribute: 8 Seek_Time_Performance
changed from 249 to 250

If I've understood the man page correctly, this refers to to bit 3:
SMART status check returned "DISK FAILING"

Does such a small performance change really preceed a failure? And how
could a failing disk pass the test (above)? Or does it regard anything
that happens as 'pre-failure', which in a sense is logical but useless.

The evidence so far does not support the idea that it's failing
hardware, but we'll see how it develops.

- Richard.
--
Richard Kimber
http://www.psr.keele.ac.uk/

< Previous Next >
Follow Ups