SATA not working with 9.3, but works with 9.1 (right format)
(Webmail sent this message twice and destroyed the format of it the first time.) Hi, To replace a dying SATA hard-disk I bought another one, but the boot process hanged with the "Searching for info file" message. Using Alt-F4 the last message was about ata2: ... After trying to boot in rescue mode with that disk alone and on both SATA ports I assumed the hard-disk was bad. Different SATA disk (Western Digital 120 GB, 8 MB cache) and still the same problem. This time I tried using my old SuSE 9.1 DVD and it sees the new disk and formatted it with ReiserFS without problem.But SuSE 9.3 still shows the same problem. I will try to type off the screen the messages from Alt-F4: [...] scsi0 : sata_sil irq 11: nobody cared! [<...>] __report_bad_irq+0x1c/0x70 [<...>] note_interrupt+0x5b/0x80 [<...>] __do_IRQ+0xdb/0xf0 [<...>] do_IRQ+0x30/0x60 [<...>] common_interrupt+0x1a/0x20 [<...>] __do_softirq+0x31/0xa0 <...>[] do_softirq+0x26/0x30 [<...>] do_IRQ+0x3d/0x60 [<...>] common_interrupt+0x1a/0x20 [<...>] default_idle+0x0/0x30 [<...>] default_idle+0x24/0x30 [<...>] cpu_idle+0x1c/0x60 [<...>] start_kernel+0x167/0x1d0 handlers: [<...>] (usb_hcd_irq+0x0/0x60 [usbcore]) [<...>] (ata_interrupt+0x0/0x100 [libata]) Disabling IRQ #11 ata2: dev 0 ATA-6, max UDMA/133, 234441648 sectors: LBA48 Any idea how I can get this hard disk to work under 9.3? I don't want to switch to SuSE10 now, because all my other machines are 9.3 and I usually a full upgrade in the summer. Thanks, Carlos PS: I forgot to mention that my original Seagate SATA still boots fine, despite the couple of bad blocks. -- Carlos -------------------------------------------------------------------- Scientific Programmer: professional who loves to _find_ own errors. -- -------------------------------------------------------------------- Carlos Frederico Lange || Phone:(780)492-6714, Fax:(780)492-2200 Dept. of Mechanical Engineering || e-mail:carlos.lange@ualberta.ca University of Alberta || http://www.mece.ualberta.ca/~clange/ Edmonton, AB, Canada T6G 2G8 || CFD, aerosols, Phoenix Mars lander -------------------------------------------------------------------- Scientific Programmer: professional who loves to _find_ own errors.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Sunday 2005-11-06 at 16:18 -0700, Carlos F Lange wrote:
PS: I forgot to mention that my original Seagate SATA still boots fine, despite the couple of bad blocks.
Having badblocks in a HD is absolutely normal. In fact, it is practically impossible to have defect free hard disks. Therefore, they have a number of sectors reserved by the manufacturer for replacing bad blocks. When the disk tries to write on a bad block, it automatically writes the data on one of the reserved blocks, and from then on, every request for that sector is instead maped to the new one. This is done transparently to the OS, but it can be dissabled (hdparm). You could use SMART to check that reserved space ussage: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 or: 5 Reallocated_Sector_Ct 0x0033 096 096 036 Pre-fail Always - 42 (except that, I understand, smartctl does not support SATA yet :-( ) One of the three Seagate disks on this system developped bad blocks some years ago, and is still working, 10000 working hours later. Not a problem. Simply having some bad blocks is not enough reason to throw away a disk. Just force a write on those two bad sectors. - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFDbsIptTMYHG2NR9URAg5/AJ9WagQhiBLnEmIRbTlOS+Zof4fSzgCgl4Qt +cMerLZkeAQk1ijmE++xz+M= =RbXD -----END PGP SIGNATURE-----
On Sunday 06 November 2005 19:55, Carlos E. R. wrote:
The Sunday 2005-11-06 at 16:18 -0700, Carlos F Lange wrote:
PS: I forgot to mention that my original Seagate SATA still boots fine, despite the couple of bad blocks.
Having badblocks in a HD is absolutely normal. In fact, it is practically impossible to have defect free hard disks. Therefore, they have a number of sectors reserved by the manufacturer for replacing bad blocks. When the disk tries to write on a bad block, it automatically writes the data on one of the reserved blocks, and from then on, every request for that sector is instead maped to the new one. This is done transparently to the OS, but it can be dissabled (hdparm).
I thought this was a sign that a hard disk was going bad. Actually resierfsck has a comment accompanying the bad block message saying something to the tune of "it is not worth risking your data with this hard disk".
You could use SMART to check that reserved space ussage:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
or:
5 Reallocated_Sector_Ct 0x0033 096 096 036 Pre-fail Always - 42
(except that, I understand, smartctl does not support SATA yet :-( )
One of the three Seagate disks on this system developped bad blocks some years ago, and is still working, 10000 working hours later. Not a problem.
OK, this gives me a bit of comfort, but it still doesn't help me with the second SATA disk I purchased. Why is 9.3 hanging on that disk, when 9.1 has no problem with it? Anything else I can try to make it work?
Simply having some bad blocks is not enough reason to throw away a disk. Just force a write on those two bad sectors.
I heard this before, but I have no idea how to write on those blocks. If I know block and sector numbers from reiserfsck, how can I direct a write command to those blocks? Carlos F.L. --
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Monday 2005-11-07 at 00:25 -0700, Carlos F Lange wrote:
boots fine, despite the couple of bad blocks.
Having badblocks in a HD is absolutely normal. In fact, it is practically impossible to have defect free hard disks. Therefore, they have a number of sectors reserved by the manufacturer for replacing bad blocks. When the disk tries to write on a bad block, it automatically writes the data on one of the reserved blocks, and from then on, every request for that sector is instead maped to the new one. This is done transparently to the OS, but it can be dissabled (hdparm).
I thought this was a sign that a hard disk was going bad. Actually resierfsck has a comment accompanying the bad block message saying something to the tune of "it is not worth risking your data with this hard disk".
Yes, it is a sign of age, but not always. My rule of thumb is to worry when the number of bad sectors continues increasing, and not if it stabilizes. For example, if I gave a hard blow with the hand on the table, and the HD complains of bad sectors, then I would change that HD _fast_. :-P
One of the three Seagate disks on this system developped bad blocks some years ago, and is still working, 10000 working hours later. Not a problem.
OK, this gives me a bit of comfort, but it still doesn't help me with the second SATA disk I purchased. Why is 9.3 hanging on that disk, when 9.1 has no problem with it? Anything else I can try to make it work?
I don't know. Who is giving that error message ("Searching for info file"), grub? I just searched the manual, but couldn't find it. But I think you mean it is a kernel message: it has a problem with iterrupt request #11, that went unhandled, then crashes. What kernel are you using, the last one? I had problems with "2.6.11.4-21.9" crashing, and had to revert to 2.6.11.4-21.8. My error was "kernel: hdb dma_timer_expiry: dma status= 0x64", and I'm not the only one having that problem with that particular kernel. Something they changed in the HD handling.
Simply having some bad blocks is not enough reason to throw away a disk. Just force a write on those two bad sectors.
I heard this before, but I have no idea how to write on those blocks. If I know block and sector numbers from reiserfsck, how can I direct a write command to those blocks?
With dd - I'm not sure of the exact command now, I'd have to dig the manual. Or overwrite the whole disk with zeros, to be on overkill^H^H..^Hsafe mode. O simply delete the files affected, and wait till the space gets overwritten... but I don't like that idea. Reiserfs is not happy handling bad blocks. Or was, I think they changed things in that respect. Older/traditional filesystems simply marked the sector as bad, and went on working. Reiserfs instead relied on the HD firmware remapping feature (also called "defect management feature"). But as I said, this has changed, although I haven't tested it. - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFDb7intTMYHG2NR9URApLzAJ9slWjy6dsWYo18y6Hca5q423cB1gCeI2To O2z8LNC+Zwn199aLosU93N0= =054W -----END PGP SIGNATURE-----
participants (2)
-
Carlos E. R.
-
Carlos F Lange