On Thu, 25 Nov 2004 14:12:25 +0100
Andi Kleen
Another point I meant to emphasise is that these errors are recent. In the preceding six months they had not occurred at all. This suggests some change in either harware or, more likely, software. My guess is that there is a bug in some recently produced update.
More likely hardware actually. Lost interrupt means that the hard disk didn't reply to a command in time. I would check the SMART statistics using smartctl and your cables.
In fact, both disks passed the 'smartctl -a' test when run from the command line. The documentation on SMART doesn't always help in the interpretation of output, though. For example, for the disk that does NOT lose interrupts smartd has returned:- /dev/hde, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 249 to 250 If I've understood the man page correctly, this refers to to bit 3: SMART status check returned "DISK FAILING" Does such a small performance change really preceed a failure? And how could a failing disk pass the test (above)? Or does it regard anything that happens as 'pre-failure', which in a sense is logical but useless. The evidence so far does not support the idea that it's failing hardware, but we'll see how it develops. - Richard. -- Richard Kimber http://www.psr.keele.ac.uk/