Re: [opensuse] Re: Copy One Hard Drive To Another Hard Drive(Backup)?

25 Apr 2013

      Linda Walsh <suse@tlinx.org> wrote:
...
Greg Freemyer wrote:
...
Linda Walsh <suse@tlinx.org> wrote:
...
Somehow this topic seems to have migrated from how to do disk to
disk
...
...
copy w/o using
command line (and so many of us try to tell him the command line is
by
far the best
for something this simple), to dealing with bad sector in a source
disk... which
fortunately for me, is a rare situation.
Linda,
I do this procedure as part of my day job.

My condolences! ;-)
...
(That's why I packaged ewfacquire, I use it routinely.)
I am sure it is better for treating bad disks than 'dd'
...
The "subject drives" which I read from are a random collection of
customer drives.  They can be a almost new drive in a new machine,
all the way to a 10 year old drive in a computer sitting out in
a shed that was almost forgotten about.  Most are from
desktop/laptop PCs a few years old and routinely in use.
I don't keep stats, but I would guess between 5 and 10% of them have
at least one bad sector.  Having a significant number of bad sectors
I agree is rare, but having one or two I would say is almost
routine.
----
 Um, are you saying a typical user should expect to see 1-2 disk
errors on a disk->disk copy?
I'm saying 90% of the time they should see 0 bad sectors, but 1 or 2 the other 10% of the time is expected and normal.
...
Isn't it fair to say that many consumer level drives not only have
1-2 disk errors, but already have such sectors remapped to per-track
spares when new?  For that matter, if it isn't a drive nearing it's
"end of useful life", would you expect users to actually see or notice
such an error -- or wouldn't it be handled by the drives internal
firmware -- w/recovery via internal ECC and remapping all handled on
the fly?  Isn't, by 'SMART' standards, a drive at the end of its
useful lifespan when it can no longer automatically relocate such data
automatically?
You don't seem to understand the bad sector life cycle.  Here you go:

- Sector magnetism becomes degraded.
- Time passes (hours, days, years)
- A sector read of that specific sector occurs.
- A sector error is returned to the userspace app and smart marks the sector bad
- all subsequent reads continue to attempt to read from the same bad sector.  That is they typically fail as well.
- time passes (seconds, days, years)
- a write to the bad sector occurs
- drive controller notes this sector is bad, time to remap it
- sector remapped and new data written
- reads now succeed
...
Isn't it *normally* the case that a user will only see disk errors
on a drive that can no longer remap sectors?
No, see the life cycle and the 2 different places an error can exist for years.  

Most raid1/5/6 solutions run scrubbers once a month or so to detect and correct these failures without degrading the raid array.  Ie. They force the reading of every sector of the array.  When an error is found they use the raid correction logic to calculate what it should have been and write the valid data back out.  It is the write that triggers the remap.  Thus a freshly scrubbed raid array should have 0 bad sectors, but a failure may occur before the next scrub.  That is why you want to run a scrub every month or so. 

Note that for the raid scenario you don't want the drive to automatically retry failed reads.  Instead you want it to try exactly once and fail immediately if there is a problem.  That lets the raid solution handle it.  Most will immediately write out the calculated data and thus fix the bad sector as soon as it is detected. Therefore a raid edition sata drive is actually less reliable than a consumer edition drive.  The only difference is the firmware.  It is the raid solution that makes the overall system more robust.

That means if you are buying drives for a raid, then definately buy a raid edition drive, but if it is for standalone use avoid raid edition drives, they don't have auto retry logic built-in.

Back to consumer systems.  If a sector holds changing data, then the error will be found and corrected quickly, but assume it is part of unallocated space near the end of the drive.  With a modern 500 gb disks, many of the sectors at the end of the drive will NEVER be read or written in the life of the drive under normal conditions.  Bad sectors there just sit there in a failed state forever.
...
...
In fact, I think the spec for new drives is no more than one bad
sector in 10e10.  500gb drives have a billion sectors, so even with
brand new drives having 1 in 10 have a bad sector would be in spec.
----
 But you are talking raw sectors -- not formatted capacity, no?
Wouldn't the MTBF say, a new, 5-year warranty Hitachi 4TB drive rated
at 1-2 million hours (for DeskStar V.  Ultrastar models) sort of imply
that most users will never see a disk error during the useful life of
that disk?
No.  Most new drives today do have zero bad sectors, but run smart on a several year old drive and you will rarely find one with zero remapped sectors.  For it to be remapped it had to report bad at some point.  That means a media error was reported at least once for every remapped sector reported by smart.  (Remember the drives / kernel have retry logic so the error may not propagate to user space so even a tool like dd won't see all the bad sectors.)

Greg
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org