Mailinglist Archive: opensuse (626 mails)

< Previous Next >
Re: [opensuse] Login weirdness
  • From: Michael Fischer <michael@xxxxxxxx>
  • Date: Fri, 2 Nov 2018 13:57:25 -0400
  • Message-id: <20181102175724.seolqozqes5nejoz@blinkenlights>
On Fri, Nov 02, Carlos E. R. wrote:
On 02/11/2018 15.27, Michael Fischer wrote:

Ah, yes. Better run the test on all disks.

The ssd produced much less output from `smartctl -a` but also
nothing which suggested errors (good, as that is /)

I've got 2 external (usb-attached) drives which are my backups.

smartctl need a `-d sat` to produce output from one of them (happy)
and `-d scsi` for the other, which insisted that

SMART support is: Available - device has SMART capability.
SMART support is: Disabled

I did `$ sudo smartctl -d scsi -s on /dev/sdb` but to no effect in the
output of `$ sudo smartctl -i -d scsi /dev/sdb`

Go figure. AFAIK, both those external disks are fine, but running badblocks
on them now for "grins".

5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always
- 0

Ok, but watch this parameter.

7 Seek_Error_Rate 0x000f 077 060 045 Pre-fail Always
- 57728258
9 Power_On_Hours 0x0032 092 092 000 Old_age Always
- 7752

Not an old disk. >
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 16
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
- 16

Ah. Yes, this is important.

Error 1 occurred at disk power-on lifetime: 7752 hours (323 days + 0 hours)
When the command that caused the error occurred, the device was in an
unknown state.

This section exceeds my skills, sorry. They are internal errors (to the
disk firmware). And it is very recent, at 7752 hours.

Had a couple of "push button" forced restarts, and one complete power outage
recently.


SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Short captive Interrupted (host reset) 60% 7752
-
# 2 Short offline Completed without error 00% 7752
-
# 3 Extended offline Completed: read failure 50% 7136
1001593016
# 4 Extended offline Completed: read failure 50% 6296
1001593016
# 5 Extended offline Completed: read failure 50% 5624
1001593016
# 6 Extended offline Completed: read failure 50% 4784
1001593016
# 7 Extended offline Completed: read failure 50% 4112
1001593016
# 8 Extended offline Completed: read failure 50% 3440
1001593016
# 9 Extended offline Completed: read failure 50% 2600
1001593016
#10 Extended offline Completed: read failure 50% 1929
1001593016
#11 Extended offline Completed: read failure 50% 1257
1021004240
#12 Extended offline Completed: read failure 50% 585
1001593008
#13 Short offline Completed without error 00% 0
-


Well, you have to do the long test to be sure. Notice that you can do
the testing while you use the computer: it just may become sluggish or
not respond. Do not power it off if it happens. Of course, the test will
take longer if the computer is busy.

Parameter 197.

[snip]

Concurrent to this, notice that there are several "extended offline"
tests that did not complete, all at the same LBA. I would rewrite that LBA.

You could try to find out to what file does that LBA belong, recover the
file if possible or replace with backup copy, and write to that LBA. Not
trivial. The write operation should trigger the remap.

Google-fu failing me as to how to go from LBA -> fs file(s). Suggestions?

Then run again the long test to see if it stops at another LBA, then
repeat till none appears.


You can also run "badblocks" on that disk. This test takes many hours
(even days), has to be done while umounted, thus from rescue media.
Sometimes this is enough to clear those bad sectors, sometimes they
appear again days later. If the command produces a list of bad sectors,
then write to them to force a remap.

One method is to rewrite to the entire partition with zeros or whatever,
then recover the data from backup.

Thanks much Carlos for the detailed response. Much appreciated.

Will try the --test=long tonight and report back.



Michael
--
Michael Fischer
michael@xxxxxxxx

--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: opensuse+owner@xxxxxxxxxxxx

< Previous Next >
Follow Ups