Re: [opensuse-support] RAID1 disk pending sectors

7 Nov 2018

      On 07/11/2018 12.39, Felix Miata wrote:
...
Carlos E. R. composed on 2018-11-07 10:35 (UTC+0100):
...
On 07/11/2018 04.48, Felix Miata wrote:
...
...
# journalctl -b -e
...
Nov 06 19:54:55 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors
Nov 06 20:24:56 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors
Nov 06 20:54:56 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors
Nov 06 21:24:56 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors
Nov 06 21:54:55 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors
# fdisk -l /dev/sdc; smartctl -x /dev/sdc
http://fm.no-ip.com/Tmp/Hardware/Disk/smartctlx-msi85-hgst1000.txt
shows Current_Pending_Sector raw value is 8 after only 1660 power on hours. :-(
...
Earlier: 81 hours.
That section went in and out or bounced off my brain or something.
...
...
How can I find out which device(s) have the bad sectors before replacing the disk?
...
Well, it is clearly sdc, and your fdisk output gives you the serial
number of the disk.
By "device(s)" I meant md or partition but should have written simply partition(s).
Compare the partition table with the LBA number to find out in which
partition it falls. Ie, maths.
...
...
Hopefully the serial is already printed in the manufacturer label, so no
need to boot.
...
...
Are these likely fixable short of replacing the disk?
...
Sometimes.
...
...
If so, how?
...
Read the recent thread "[opensuse] Login weirdness".
# smartctl -a /dev/sdc
pending sector count remains 8
...
Basically, your disk fails consistently on LBA 142446713, read error.
You have to find out what is there and rewrite that sector.
# smartctl -t long /dev/sdc
is now running
I missed the section pointing to the LBA. 142446713 is on sdc8, which happens
to be md3, which is on /home. There remains to identify the file or structure
that uses 142446713.
Right. That part depends on the filesystem, and is not trivial to find
out. The howto link referred on the other thread explains it a bit.
...
That same secton has this puzzling line:
#10 Short offline Completed without error 00% 43301 -
That's failure at a lifetime of 43301 hours on a disk the appears to have only
1660 power on hours.
I don't know.
That line at the bottom is maybe a line of a non-finished test. Wait
till it is at the top, #1. If not, then ignore it.
...
...
...
Can an extended
offline test force them to be rewritten and reallocated?
...
No.
...
A read doesn't remap the sector, only a write.
...
...
Could this be some kind
of false alarm on so young a disk?
...
No. However, an error doesn't need to be fatal, most disks have errors
you do not see. Disks have an area to remap those bad sectors out of
sight and use, and continue working. The problem is if they continue to
develop more errors.
# journalctl | grep pending | wc -l
3175
# journalctl | grep pending | head
Sep 02 01:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
Sep 02 02:28:52 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
Sep 02 02:58:52 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
Sep 02 03:28:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
Sep 02 03:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
Sep 02 04:28:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
Sep 02 04:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
Sep 02 05:28:52 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
Sep 02 05:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
Sep 02 06:28:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
IIRC, Sep 02 is when the device was placed in service. It would be nice to be able to
figure out which file(s) that/those sectors belong to so to force them to be rewritten.
You can force it anyway.

Read that LBA sector to a file using ddrescue (ie, read multiple times,
direct, no cache), then rewrite it with dd.

And cross fingers.

-- 
Cheers / Saludos,

		Carlos E. R.

  (from openSUSE 15.0 (Legolas))