[opensuse] Re: disks... (is it broken or is it something else?)
On 9/1/08, michael <cs@networkingnewsletter.org.uk> wrote:
On 14 Aug I reported (update at end!)
A couple of weeks ago, 'smart' sent me:
The following warning/error was logged by the smartd daemon: Device: /dev/hdb, not capable of SMART self-check
and in /var/log/messages for that time I see (exc iptables info):
Jul 30 05:13:10 ratty kernel: hda: lost interrupt Jul 30 05:14:10 ratty kernel: ide-cd: cmd 0x3 timed out Jul 30 05:14:10 ratty kernel: hda: lost interrupt Jul 30 05:15:10 ratty kernel: ide-cd: cmd 0x3 timed out Jul 30 05:15:10 ratty kernel: hda: lost interrupt Jul 30 05:15:10 ratty kernel: hdb: status error: status=0x00 { } Jul 30 05:15:10 ratty kernel: ide: failed opcode was: 0xb0 Jul 30 05:15:17 ratty kernel: hda: lost interrupt {etc}
I also note that on 23 & on 25 Jul (ie a week earlier) the machine had frozen over night so I had to do a hard reboot (pull the kettle lead out)
I didn't reboot on 30 Jul but did earlier tonight and the machine failed to boot - /dev/hdb not being 'found' by the BIOS (I think). I rebooted went into BIOS & all disks were present so continued to reboot but just now got another SMART warning (same as above "Device: /dev/hdb, not capable of SMART self-check") and if I try and cd/ls a partition on the device it just hangs.
michael@ratty:~$ sudo smartctl -a /dev/hdb smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION === Device Model: [No Information Found] Serial Number: [No Information Found] Firmware Version: [No Information Found] Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 1 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Aug 14 19:49:28 2008 BST SMART is only available in ATA Version 3 Revision 3 or greater. We will try to proceed in spite of this. SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported. A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
and I'm not sure what that means.
I'll welcome advice on whether it's the HDD about to die or whether the interrupt/timed out messages indicate something else?
This is on box with michael@ratty:~$ uname -a Linux ratty.xxxxx.ac.uk 2.6.18-6-686 #1 SMP Tue Jun 17 21:31:27 UTC 2008 i686 GNU/Linux
UPDATE: I've, as suggested, put this disk into another machine and so far (about 5 hrs) 'smart' hasn't reported any errors and I have successfully done 'ls -R' and 'find . -type f' on the (only) partition a few times...
So my question is whether the hard drive is actually okay? And if so why was I getting numerous errors before??
All help at solving this most welcome!
Michael
-- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Hi It could be that the controller on that mobo is bad. However, I'd buy a new disk and copy the data (dd overnight should do it in most cases). You cannot have to many disks filled with backups :D. To check wether the mobo is bad you could try and stick another (non-critical data) disk (diff brand preferably) on the same connector. If it sends the same messages, you may have a bad controller and still be happy with your disk. Just be sure to back the data up as fast as you can. A dying disk may stop giving SMART messages for some time before it dies for good. SMART is not a good way of determinig wether a disk will die soon, just the only way. Neil -- There are three kinds of people: Those who can count, and those who cannot count ----------------------------------------------------------------------- ** Hi! I'm a signature virus! Copy me into your signature, please! ** ----------------------------------------------------------------------- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (1)
-
Neil