Re: [opensuse-support] RAID1 disk pending sectors

10 Nov 2018

      On 10/11/2018 02.01, Felix Miata wrote:
...
Felix Miata composed on 2018-11-08 00:07 (UTC-0500):
...
Felix Miata composed on 2018-11-07 06:39 (UTC-0500):
...
...
Carlos E. R. composed on 2018-11-07 10:35 (UTC+0100):
...
...
...
On 07/11/2018 04.48, Felix Miata wrote:
...
...
...
...
# journalctl -b -e
...
Nov 06 21:54:55 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors
# fdisk -l /dev/sdc; smartctl -x /dev/sdc
http://fm.no-ip.com/Tmp/Hardware/Disk/smartctlx-msi85-hgst1000.txt
shows Current_Pending_Sector raw value is 8 after only 1660 power on hours. :-(
...
Basically, your disk fails consistently on LBA 142446713, read error.
You have to find out what is there and rewrite that sector.
...
...
# smartctl -t long /dev/sdc
is now running
...
...
I missed the section pointing to the LBA. 142446713 is on sdc8, which happens
to be md3, which is on /home. There remains to identify the file or structure
that uses 142446713.
...
...
That same secton has this puzzling line:
#10 Short offline Completed without error 00% 43301 -
That's failure at a lifetime of 43301 hours on a disk the appears to have only
1660 power on hours.
...
# cat /proc/mdstat
...
md3 : active raid1 sdb8[0] sdc8[1]
      73727872 blocks super 1.0 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk
...
# fdisk -l /dev/sdc
...
Device         Start        End    Sectors  Size Type
...
/dev/sdc8   61171712  208627711  147456000 70.3G Linux RAID
...
...
Shouldn't the following process force LBA 142446713 to be reallocated?
...
fail sdc8 from md3
remove sdc8 from md3
dd if=/dev/zero of=/devsdc8 bs=32768
add sdc8 to md3
...
It seems it would be simpler than trying to figure out which file or inode
uses it.
I didn't see a direct response, so I tried:
The howto I linked was the response to that. It is *difficult* to find
out the file or inode that use a certain sector, and is filesystem
dependent.
...
smartctl -a /dev/sdc | less
cat /proc/mdstat
Mnt
mdadm --manage /dev/md3 --fail /dev/sdc8
mdadm --manage /dev/md3 --remove /dev/sdc8
cat /proc/mdstat
smartctl -a /dev/sdb | less
cat /proc/mdstat
dd if=/dev/zero of=/dev/sdc8 bs=32768
Why 32768 bytes?
...
smartctl -a /dev/sdc | less
At this point, you haven't done a testing of the disk, so the results
are not complete.
...
mdadm --manage /dev/md3 --add /dev/sdc8
cat /proc/mdstat
smartctl -a /dev/sdc | less
cat /proc/mdstat
Subsequently:
smartctl -t long /dev/sdc
(5 hours later)
smartctl -a /dev/sdc | less
to find:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
...
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       3
...
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       5
Previously there were 8. Strange. Only 3 remapped? Why? :-?

Ah, I see.
Hypothesis: Only 3 sectors were in sdc8, the rest are outside.
...
SMART Error Log Version: 0
No Errors Logged
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       70%      1724         605782533
# 2  Extended offline    Completed: read failure       70%      1719         605782533
# 3  Extended offline    Completed: read failure       90%      1668         142446713
# 4  Extended offline    Completed: read failure       90%      1591         142446713
# 5  Extended offline    Completed: read failure       90%      1590         142446713
# 6  Extended offline    Completed: read failure       90%      1175         142446713
# 7  Extended offline    Completed: read failure       90%       919         142446713
# 8  Extended offline    Completed: read failure       90%       249         142446713
# 9  Extended offline    Completed: read failure       90%        81         142446713
#10  Short offline       Completed without error       00%        18         -
#11  Short offline       Completed without error       00%        12         -
#12  Short offline       Completed without error       00%         8         -
#13  Short offline       Completed without error       00%     43301         -
So different LBA logged.
...
...
journalctl | tail -n(#)
Nov 09 14:26:00 00srv kernel: CIFS VFS: bogus file nlink value 0
Nov 09 14:40:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Nov 09 15:10:03 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Nov 09 15:40:03 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Nov 09 15:40:03 00srv smartd[962]: Device: /dev/sdc [SAT], previous self-test completed with error (read test element)
Nov 09 15:40:04 00srv smartd[962]: Device: /dev/sdc [SAT], Self-Test Log error count increased from 8 to 9
Nov 09 16:10:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Nov 09 16:40:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Nov 09 17:10:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Nov 09 17:40:03 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Nov 09 18:10:03 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Nov 09 18:40:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Nov 09 19:10:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Nov 09 19:40:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
:-(
I guess now I need to fail the entirety of sdc, zero fill with ddrescue instead of dd,
and smartctl -x again before either replacing or trying to add back to RAID?
I don't think fill with ddrescue would do differently.

You can also force write the single LBA sector 605782533. First try to
read it, then rewrite it...

-- 
Cheers / Saludos,

		Carlos E. R.
		(from 42.3 x86_64 "Malachite" at Telcontar)