Mailinglist Archive: opensuse-support (97 mails)

< Previous Next >
Re: [opensuse-support] RAID1 disk pending sectors
Carlos E. R. composed on 2018-11-07 10:35 (UTC+0100):

On 07/11/2018 04.48, Felix Miata wrote:

# journalctl -b -e
...
Nov 06 19:54:55 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently
unreadable (pending) sectors
Nov 06 20:24:56 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently
unreadable (pending) sectors
Nov 06 20:54:56 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently
unreadable (pending) sectors
Nov 06 21:24:56 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently
unreadable (pending) sectors
Nov 06 21:54:55 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently
unreadable (pending) sectors
# fdisk -l /dev/sdc; smartctl -x /dev/sdc
http://fm.no-ip.com/Tmp/Hardware/Disk/smartctlx-msi85-hgst1000.txt
shows Current_Pending_Sector raw value is 8 after only 1660 power on hours.
:-(

Earlier: 81 hours.

That section went in and out or bounced off my brain or something.

How can I find out which device(s) have the bad sectors before replacing the
disk?

Well, it is clearly sdc, and your fdisk output gives you the serial
number of the disk.

By "device(s)" I meant md or partition but should have written simply
partition(s).

If you do not know which one it is, remove them all, boot a rescue
media, connect a single one, read the identifiers (hdparm -i /dev/sda),
and put a sticker with the "model=" string somewhere you can read.

Physically speaking it's the lower mounted of the RAID pair.

Hopefully the serial is already printed in the manufacturer label, so no
need to boot.

Are these likely fixable short of replacing the disk?

Sometimes.

If so, how?

Read the recent thread "[opensuse] Login weirdness".

# smartctl -a /dev/sdc
pending sector count remains 8

Basically, your disk fails consistently on LBA 142446713, read error.
You have to find out what is there and rewrite that sector.

# smartctl -t long /dev/sdc
is now running

I missed the section pointing to the LBA. 142446713 is on sdc8, which happens
to be md3, which is on /home. There remains to identify the file or structure
that uses 142446713.

That same secton has this puzzling line:
#10 Short offline Completed without error 00% 43301 -
That's failure at a lifetime of 43301 hours on a disk the appears to have only
1660 power on hours.

Can an extended
offline test force them to be rewritten and reallocated?

No.

A read doesn't remap the sector, only a write.

Could this be some kind
of false alarm on so young a disk?

No. However, an error doesn't need to be fatal, most disks have errors
you do not see. Disks have an area to remap those bad sectors out of
sight and use, and continue working. The problem is if they continue to
develop more errors.

# journalctl | grep pending | wc -l
3175
# journalctl | grep pending | head
Sep 02 01:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently
unreadable (pending) sectors
Sep 02 02:28:52 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently
unreadable (pending) sectors
Sep 02 02:58:52 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently
unreadable (pending) sectors
Sep 02 03:28:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently
unreadable (pending) sectors
Sep 02 03:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently
unreadable (pending) sectors
Sep 02 04:28:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently
unreadable (pending) sectors
Sep 02 04:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently
unreadable (pending) sectors
Sep 02 05:28:52 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently
unreadable (pending) sectors
Sep 02 05:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently
unreadable (pending) sectors
Sep 02 06:28:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently
unreadable (pending) sectors

IIRC, Sep 02 is when the device was placed in service. It would be nice to be
able to
figure out which file(s) that/those sectors belong to so to force them to be
rewritten.
--
Evolution as taught in public schools is religion, not science.

Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata *** http://fm.no-ip.com/
--
To unsubscribe, e-mail: opensuse-support+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: opensuse-support+owner@xxxxxxxxxxxx

< Previous Next >