[opensuse] Hard disk problem?
Hello,
For the second time in a week time I found my root partition was
so damaged that I couldn't boot up anymore.
At boottime I got this:
<3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
<3>ata1.00: BMDMA stat 0x64
<3>ata1.00: cmd c8/00:08:6d:65:14/00:00:00:00:00/ec tag 0 cdb 0x1e data
4096 in <4> res 51/40:00:6f:65:14/00:00:00:00:00/ec Emask 0x9
(media error) <6>ata1.00: configured for UDMA/100
<6>ata1.01: configured for UDMA/44
<6>ata1: EH complete
I got this uncountable times.
At the end I got this:
<6>sd 0:0:0:0: [sda] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE,SUGGEST_OK <6>sd 0:0:0:0: [sda] Sense Key :
Medium Error [current] [descriptor] <4>Descriptor sense data with sense
descriptors (in hex): <6> 72 03 11 04 00 00 00 0c 00 0a 80 00 00
00 00 00 <6> 0c 14 65 6f
<6>sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto
reallocate failed <4>end_request: I/O error, dev sda, sector 202663279
<6>ata1: EH complete
<2>EXT3-fs error (device sda9): ext3_get_inode_loc: unable to read
inode block - inode=877217, block=1769563 <5>sd 0:0:0:0: [sda]
234441648 512-byte hardware sectors (120034 MB) <5>sd 0:0:0:0: [sda]
Write Protect is off <7>sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
<2>EXT3-fs error (device sda9) in ext3_reserve_inode_write: IO failure
<5>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA <5>sd 0:0:0:0: [sda] 234441648 512-byte hardware
sectors (120034 MB) <5>sd 0:0:0:0: [sda] Write Protect is off
<7>sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
<5>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA Kernel logging (ksyslog) stopped.
Kernel log daemon terminating.
A fsck of /dec/sda9 resulted in
Error reading block 1769563 (Attempt to read block from filesystem
resulted in short read) while getting next inode from scan. Ignore
error <y>? Yes Force rewrite <y>? Yes
repeated numerous times
At the end lots of files went to the lost+found dir
As result of that I couldn't reboot anymore.
The first time I re-installed OpenSuse 10.3 (as 11.0 failed to install)
and all went fine for a week. But now the same thing happened again.
Is this a hard disk failure?
If not can I resolve these problems without doing a complete
re-installation of OpenSuse 10.3?
TIA,
Martin
--
Martin /Nightowl/ Byttebier
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2008-12-20 at 12:15 +0100, Martin /Nightowl/ Byttebier wrote:
Is this a hard disk failure?
Could be. Use the HD test program from the manufacturer, or use smartctl. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAklM+mcACgkQtTMYHG2NR9WxIwCgk88M2d4pBmI+AaD8Uok5cLCs CqsAoJGiuP6MSUV6UV5zs8xfJm+hT+Ek =Mvn9 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sat, 20 Dec 2008 15:00:05 +0100 (CET)
"Carlos E. R."
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Saturday, 2008-12-20 at 12:15 +0100, Martin /Nightowl/ Byttebier wrote:
Is this a hard disk failure?
Could be. Use the HD test program from the manufacturer, or use smartctl.
Ok, thanks. I don't think I've a HD test program from the manufacturer.
I'll try out smartctl.
CU,
Martin
--
Martin /Nightowl/ Byttebier
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2008-12-20 at 15:44 +0100, Martin /Nightowl/ Byttebier wrote:
Could be. Use the HD test program from the manufacturer, or use smartctl.
Ok, thanks. I don't think I've a HD test program from the manufacturer.
You can usually get it from their web page. For example, Seagate has a good one. I think it is a small ISO nowdays, previously it was a floppy image (dos bootable). Fujitsu... I had a hard time searching for it and I failed.
I'll try out smartctl.
It is basically the same thing, but without text feedback while it runs. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAklNFCAACgkQtTMYHG2NR9Xc8wCfXcf5cS/wh92LHNUQflmPn2Z4 yE4AniZN7omodp1yi0Ljk+i4R8RQeOMw =zsYW -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 2008/12/20 16:49 (GMT+0100) Carlos E. R. composed:
On Saturday, 2008-12-20 at 15:44 +0100, Martin /Nightowl/ Byttebier wrote:
Could be. Use the HD test program from the manufacturer, or use smartctl.
Ok, thanks. I don't think I've a HD test program from the manufacturer.
You can usually get it from their web page. For example, Seagate has a good one. I think it is a small ISO nowdays, previously it was a floppy image (dos bootable). Fujitsu... I had a hard time searching for it and I failed.
Ultimatebootcd iso has them all, and a lot more useful utilities. -- "Unless the Lord builds the house, its builders labor in vain." Psalm 127:1 NIV Team OS/2 ** Reg. Linux User #211409 Felix Miata *** http://fm.no-ip.com/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sat, 20 Dec 2008 15:00:05 +0100 (CET)
"Carlos E. R."
On Saturday, 2008-12-20 at 12:15 +0100, Martin /Nightowl/ Byttebier wrote:
Is this a hard disk failure?
Could be. Use the HD test program from the manufacturer, or use smartctl.
Oke, I did run smartctl from the systemrescueCD.
root@sysresccd /root % smartctl -l error /dev/sda
smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 889 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 889 occurred at disk power-on lifetime: 15067 hours (627 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 47 bb 1f e1 Error: UNC 8 sectors at LBA = 0x011fbb47 = 18856775
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 47 bb 1f e1 00 02:47:39.609 READ DMA
ca 00 08 47 00 5e e0 00 02:47:39.599 WRITE DMA
ca 00 70 a7 99 5e e0 00 02:47:39.599 WRITE DMA
c8 00 08 17 67 d4 e5 00 02:47:39.580 READ DMA
e7 00 00 00 00 00 a0 00 02:47:38.563 FLUSH CACHE
.......
....
...
====================================================================
Now that likes chinese to me.
Any comment?
TIA,
Martin
--
Martin /Nightowl/ Byttebier
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2008-12-20 at 17:07 +0100, Martin /Nightowl/ Byttebier wrote:
Could be. Use the HD test program from the manufacturer, or use smartctl.
Oke, I did run smartctl from the systemrescueCD.
root@sysresccd /root % smartctl -l error /dev/sda
Ok, that prints the log, but you need to run the tests. First use "--health". Then: smartctl -a /dev/sda | less and find the "Attributes" section. Then trigger a short test, when it finishes, a long one (smartctl --test=long /dev/sda). And then find the results (-a). - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAklNkMAACgkQtTMYHG2NR9UzqACeIiMrowhTdEVe+7TXE9yVucnP uJYAoJkcz4bFe7tv5hN/zGNEixNDtYuV =RFwN -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sun, 21 Dec 2008 01:41:32 +0100 (CET)
"Carlos E. R."
Oke, I did run smartctl from the systemrescueCD.
root@sysresccd /root % smartctl -l error /dev/sda
Ok, that prints the log, but you need to run the tests. First use "--health". Then:
smartctl -a /dev/sda | less
and find the "Attributes" section. Then trigger a short test, when it finishes, a long one (smartctl --test=long /dev/sda). And then find the results (-a)
Done that all.
See:
URL:http://users.telenet.be/tos4ever/downloads/attributes.txt
URL:http://users.telenet.be/tos4ever/downloads/long-test.txt
URL:http://users.telenet.be/tos4ever/downloads/fdisk.txt
The partition which causes problems is /dev/sda9
I really don't understand what's going on. I would appreciate it if
someone could enlighten me.
TIA,
Martin
--
Martin /Nightowl/ Byttebier
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin /Nightowl/ Byttebier wrote:
On Sun, 21 Dec 2008 01:41:32 +0100 (CET) "Carlos E. R."
wrote: Oke, I did run smartctl from the systemrescueCD.
root@sysresccd /root % smartctl -l error /dev/sda Ok, that prints the log, but you need to run the tests. First use "--health". Then:
smartctl -a /dev/sda | less
and find the "Attributes" section. Then trigger a short test, when it finishes, a long one (smartctl --test=long /dev/sda). And then find the results (-a)
Done that all. See: URL:http://users.telenet.be/tos4ever/downloads/attributes.txt URL:http://users.telenet.be/tos4ever/downloads/long-test.txt URL:http://users.telenet.be/tos4ever/downloads/fdisk.txt
The partition which causes problems is /dev/sda9 I really don't understand what's going on. I would appreciate it if someone could enlighten me.
It is not clear (to me), but it seems that sometime (at 15067 hours of usage) the disk had uncorrectable read errors (8 sectors at LBA = 18856775) and another at 202924991, I think. You can see the sector numbers in fdisk using "u" to change the display units, but I'm unsure if those will be LBA numbers. However, then I see this: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 and I'm not sure if they have been remapped or not. The disk reports itself as healthy, and it probably is, but... touch wood. Best thing is to backup everything as soon as possible. Then you could perhaps rewrite the entire disk (with dd) to see if there are write errors. The disk is a bit old, anyway. Also, being a Seagate disk, there is a good test utility - here: http://www.seagate.com/www/en-us/support/downloads/seatools/ Youshould use the Dos version, I think. - -- Cheers / Saludos, Carlos E. R. (from 11.1-ex-factory) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAklOxG4ACgkQU92UU+smfQUqygCgkIEHBq9YYKO/a+bfoIoxmEz3 GbYAniHdfYH063DLVFXJ1uRk/fneTnqv =drnP -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
2008/12/21 Carlos E. R.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Martin /Nightowl/ Byttebier wrote:
On Sun, 21 Dec 2008 01:41:32 +0100 (CET) "Carlos E. R."
wrote: Also, being a Seagate disk, there is a good test utility - here:
http://www.seagate.com/www/en-us/support/downloads/seatools/
Youshould use the Dos version, I think.
Yep, I had similar errors on a Seagate drive a year ago. Once I used the utility, and the tests provided it's worked flawlessly since! I used a version that I could burn and boot from CD-ROM from there. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sun, 21 Dec 2008 23:35:54 +0000
"Rob OpenSuSE"
2008/12/21 Carlos E. R.
: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Martin /Nightowl/ Byttebier wrote:
On Sun, 21 Dec 2008 01:41:32 +0100 (CET) "Carlos E. R."
wrote: Also, being a Seagate disk, there is a good test utility - here:
http://www.seagate.com/www/en-us/support/downloads/seatools/
Youshould use the Dos version, I think.
Yep, I had similar errors on a Seagate drive a year ago. Once I used the utility, and the tests provided it's worked flawlessly since!
I used a version that I could burn and boot from CD-ROM from there.
Thanks. I bought me a new drive and will use it to install OpenSuse
10.3 or maybe 11.1 if I can install it (11.0 failed to install).
I'll keep the older drive for Windows.
Thanks,
Martin
--
Martin /Nightowl/ Byttebier
On Sun, 21 Dec 2008 23:34:23 +0100
"Carlos E. R."
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Martin /Nightowl/ Byttebier wrote:
On Sun, 21 Dec 2008 01:41:32 +0100 (CET) "Carlos E. R."
wrote: Oke, I did run smartctl from the systemrescueCD.
root@sysresccd /root % smartctl -l error /dev/sda Ok, that prints the log, but you need to run the tests. First use "--health". Then:
smartctl -a /dev/sda | less
and find the "Attributes" section. Then trigger a short test, when it finishes, a long one (smartctl --test=long /dev/sda). And then find the results (-a)
Done that all. See: URL:http://users.telenet.be/tos4ever/downloads/attributes.txt URL:http://users.telenet.be/tos4ever/downloads/long-test.txt URL:http://users.telenet.be/tos4ever/downloads/fdisk.txt
The partition which causes problems is /dev/sda9 I really don't understand what's going on. I would appreciate it if someone could enlighten me.
It is not clear (to me), but it seems that sometime (at 15067 hours of usage) the disk had uncorrectable read errors (8 sectors at LBA = 18856775) and another at 202924991, I think. You can see the sector numbers in fdisk using "u" to change the display units, but I'm unsure if those will be LBA numbers.
However, then I see this:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
and I'm not sure if they have been remapped or not. The disk reports itself as healthy, and it probably is, but... touch wood.
Best thing is to backup everything as soon as possible. Then you could perhaps rewrite the entire disk (with dd) to see if there are write errors. The disk is a bit old, anyway.
Also, being a Seagate disk, there is a good test utility - here:
http://www.seagate.com/www/en-us/support/downloads/seatools/
I've tried the windows version which told me the disk is healty. As
soon as I can I'll try the dos-version.
Anyway I've bought me a new harddisk, so one of these days I'll install
that drive and use it for Suse. The old disk will be good enough to run
Windows when needed.
Thanks,
Martin
--
Martin /Nightowl/ Byttebier
Martin /Nightowl/ Byttebier escribió:
Is this a hard disk failure?
yes, most likely :( backup your data inmediately and replace the drive before it is too late... -- "We have art in order not to die of the truth" - Friedrich Nietzsche Cristian Rodríguez R. Platform/OpenSUSE - Core Services SUSE LINUX Products GmbH Research & Development http://www.opensuse.org/
participants (5)
-
Carlos E. R.
-
Cristian Rodríguez
-
Felix Miata
-
Martin /Nightowl/ Byttebier
-
Rob OpenSuSE