On 10/11/2018 02.01, Felix Miata wrote:
Felix Miata composed on 2018-11-08 00:07 (UTC-0500):
Felix Miata composed on 2018-11-07 06:39 (UTC-0500):
Carlos E. R. composed on 2018-11-07 10:35 (UTC+0100):
On 07/11/2018 04.48, Felix Miata wrote:
# journalctl -b -e ... Nov 06 21:54:55 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors # fdisk -l /dev/sdc; smartctl -x /dev/sdc http://fm.no-ip.com/Tmp/Hardware/Disk/smartctlx-msi85-hgst1000.txt shows Current_Pending_Sector raw value is 8 after only 1660 power on hours. :-( ... Basically, your disk fails consistently on LBA 142446713, read error. You have to find out what is there and rewrite that sector.
# smartctl -t long /dev/sdc is now running
I missed the section pointing to the LBA. 142446713 is on sdc8, which happens to be md3, which is on /home. There remains to identify the file or structure that uses 142446713.
That same secton has this puzzling line: #10 Short offline Completed without error 00% 43301 - That's failure at a lifetime of 43301 hours on a disk the appears to have only 1660 power on hours.
# cat /proc/mdstat ... md3 : active raid1 sdb8[0] sdc8[1] 73727872 blocks super 1.0 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk ... # fdisk -l /dev/sdc ... Device Start End Sectors Size Type ... /dev/sdc8 61171712 208627711 147456000 70.3G Linux RAID ...
Shouldn't the following process force LBA 142446713 to be reallocated?
fail sdc8 from md3 remove sdc8 from md3 dd if=/dev/zero of=/devsdc8 bs=32768 add sdc8 to md3
It seems it would be simpler than trying to figure out which file or inode uses it.
I didn't see a direct response, so I tried:
The howto I linked was the response to that. It is *difficult* to find out the file or inode that use a certain sector, and is filesystem dependent.
smartctl -a /dev/sdc | less cat /proc/mdstat Mnt mdadm --manage /dev/md3 --fail /dev/sdc8 mdadm --manage /dev/md3 --remove /dev/sdc8 cat /proc/mdstat smartctl -a /dev/sdb | less cat /proc/mdstat dd if=/dev/zero of=/dev/sdc8 bs=32768
Why 32768 bytes?
smartctl -a /dev/sdc | less
At this point, you haven't done a testing of the disk, so the results are not complete.
mdadm --manage /dev/md3 --add /dev/sdc8 cat /proc/mdstat smartctl -a /dev/sdc | less cat /proc/mdstat
Subsequently: smartctl -t long /dev/sdc (5 hours later) smartctl -a /dev/sdc | less to find: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE ... 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 3 ... 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 5
Previously there were 8. Strange. Only 3 remapped? Why? :-? Ah, I see. Hypothesis: Only 3 sectors were in sdc8, the rest are outside.
SMART Error Log Version: 0 No Errors Logged
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 70% 1724 605782533 # 2 Extended offline Completed: read failure 70% 1719 605782533 # 3 Extended offline Completed: read failure 90% 1668 142446713 # 4 Extended offline Completed: read failure 90% 1591 142446713 # 5 Extended offline Completed: read failure 90% 1590 142446713 # 6 Extended offline Completed: read failure 90% 1175 142446713 # 7 Extended offline Completed: read failure 90% 919 142446713 # 8 Extended offline Completed: read failure 90% 249 142446713 # 9 Extended offline Completed: read failure 90% 81 142446713 #10 Short offline Completed without error 00% 18 - #11 Short offline Completed without error 00% 12 - #12 Short offline Completed without error 00% 8 - #13 Short offline Completed without error 00% 43301 -
So different LBA logged.
... journalctl | tail -n(#) Nov 09 14:26:00 00srv kernel: CIFS VFS: bogus file nlink value 0 Nov 09 14:40:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors Nov 09 15:10:03 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors Nov 09 15:40:03 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors Nov 09 15:40:03 00srv smartd[962]: Device: /dev/sdc [SAT], previous self-test completed with error (read test element) Nov 09 15:40:04 00srv smartd[962]: Device: /dev/sdc [SAT], Self-Test Log error count increased from 8 to 9 Nov 09 16:10:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors Nov 09 16:40:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors Nov 09 17:10:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors Nov 09 17:40:03 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors Nov 09 18:10:03 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors Nov 09 18:40:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors Nov 09 19:10:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors Nov 09 19:40:04 00srv smartd[962]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
:-(
I guess now I need to fail the entirety of sdc, zero fill with ddrescue instead of dd, and smartctl -x again before either replacing or trying to add back to RAID?
I don't think fill with ddrescue would do differently. You can also force write the single LBA sector 605782533. First try to read it, then rewrite it... -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)