Re: [SLE] little problem with reiserfs and bad blocks

31 Jan 2004

      El 2004-01-30 a las 23:54 -0700, c_nelson77 escribió:
...
Anychance you can help me down this path a little more?  What is it I am
suppose to do?  I found this:
http://linux.about.com/library/cmd/blcmdl8_hdparm.htm
Simply "man hdparm" -- too many cookies there.
...
The best  thing I see is "-D Enable/disable the on-drive defect management
feature"
That's right, I was thinking about that. I think it is enabled by default,
but you can enable it if on doubt.
...
Coudl you elaborate on what I should do?
Not right now, I have to go out in a few minutes. We talked about this
same issue on the list not longer than a few months ago.

Simply trying to write to a bad block triggers the relocation. I did that
by copying the partition with errors elsewere, reformating twice (as ext3,
check badblocks, nothing found, back to reiser) and restoring every thing.
Too conlvoluted, I know: I was testing. If the sectors is known, write to
it, or simply move the file over somewhere else. Or do a badbock testing
with write - I don't know if it is destructive.

Issue this command:

 smartctl --all /dev/hda|less

You will see, amongst other things, a log of your hard disk error, as seen
by SMART - if enabled. For example:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   059   054   025    Pre-fail     -       176948510
  3 Spin_Up_Time            0x0003   097   096   000    Pre-fail     -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age      -       1184
  5 Reallocated_Sector_Ct   0x0033   098   098   036    Pre-fail     -       23
....

That table shows some parameters predicting if the HD is near failure. For
a disk with solved errors, I see:

Error 325 occurred at disk power-on lifetime: 4275 hours
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER:40 SC:64 SN:9c CL:16 CH:70 D/H:51 ST:51
Sequence of commands leading to the command that caused the error were:
DCR   FR   SC   SN   CL   CH   D/H   CR   Timestamp
 00   d0   00   00   15   70    51   40     3.514
 00   d0   00   00   14   70    51   40     3.476
 00   d0   00   00   13   70    51   40     7.384
 00   d0   00   00   12   70    51   40     3.537
 00   d0   00   00   11   70    51   40     3.499

You can do testing of the disk (long and short) while on use (without
stopping the OS):

SMART Self-test log, version number 1
Num  Test_Description    Status           Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short off-line      Completed                00%      5613         -
# 2  Extended off-line   Completed                00%      4918         -
# 3  Short off-line      Completed                00%      4915         -
# 4  Short off-line      Completed: read failure  90%      4272         0x0170169c

You see, info is very complete. I don't know if it is saved on disk
EPROM memory, or a track.

-- 
Saludos
       Carlos Robinson

Carlos E. R.

tags

participants (1)