On 07/11/2018 12.39, Felix Miata wrote:
Carlos E. R. composed on 2018-11-07 10:35 (UTC+0100):
On 07/11/2018 04.48, Felix Miata wrote:
# journalctl -b -e ... Nov 06 19:54:55 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors Nov 06 20:24:56 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors Nov 06 20:54:56 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors Nov 06 21:24:56 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors Nov 06 21:54:55 00srv smartd[937]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors # fdisk -l /dev/sdc; smartctl -x /dev/sdc http://fm.no-ip.com/Tmp/Hardware/Disk/smartctlx-msi85-hgst1000.txt shows Current_Pending_Sector raw value is 8 after only 1660 power on hours. :-(
Earlier: 81 hours.
That section went in and out or bounced off my brain or something.
How can I find out which device(s) have the bad sectors before replacing the disk?
Well, it is clearly sdc, and your fdisk output gives you the serial number of the disk.
By "device(s)" I meant md or partition but should have written simply partition(s).
Compare the partition table with the LBA number to find out in which partition it falls. Ie, maths.
Hopefully the serial is already printed in the manufacturer label, so no need to boot.
Are these likely fixable short of replacing the disk?
Sometimes.
If so, how?
Read the recent thread "[opensuse] Login weirdness".
# smartctl -a /dev/sdc pending sector count remains 8
Basically, your disk fails consistently on LBA 142446713, read error. You have to find out what is there and rewrite that sector.
# smartctl -t long /dev/sdc is now running
I missed the section pointing to the LBA. 142446713 is on sdc8, which happens to be md3, which is on /home. There remains to identify the file or structure that uses 142446713.
Right. That part depends on the filesystem, and is not trivial to find out. The howto link referred on the other thread explains it a bit.
That same secton has this puzzling line: #10 Short offline Completed without error 00% 43301 - That's failure at a lifetime of 43301 hours on a disk the appears to have only 1660 power on hours.
I don't know. That line at the bottom is maybe a line of a non-finished test. Wait till it is at the top, #1. If not, then ignore it.
Can an extended offline test force them to be rewritten and reallocated?
No.
A read doesn't remap the sector, only a write.
Could this be some kind of false alarm on so young a disk?
No. However, an error doesn't need to be fatal, most disks have errors you do not see. Disks have an area to remap those bad sectors out of sight and use, and continue working. The problem is if they continue to develop more errors.
# journalctl | grep pending | wc -l 3175 # journalctl | grep pending | head Sep 02 01:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors Sep 02 02:28:52 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors Sep 02 02:58:52 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors Sep 02 03:28:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors Sep 02 03:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors Sep 02 04:28:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors Sep 02 04:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors Sep 02 05:28:52 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors Sep 02 05:58:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors Sep 02 06:28:53 00srv smartd[1056]: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
IIRC, Sep 02 is when the device was placed in service. It would be nice to be able to figure out which file(s) that/those sectors belong to so to force them to be rewritten.
You can force it anyway. Read that LBA sector to a file using ddrescue (ie, read multiple times, direct, no cache), then rewrite it with dd. And cross fingers. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))