[opensuse] Badblock scanning of Terabyte drive

Anton Aylward

12 Apr 2017 12 Apr '17

13:28

My desktop's 1T drive is approaching it's supposed EOL. I realise that time was, the bathtub curve meant that if a drive worked it probably would keep on working, but modern economics seems to mean that a component dies the day after its warranty expires. So I'm apprehensive. I see 2T and 3T drives no on sale for less than I paid for my 1T drive. Why I should consider them when I haven't filled my 1T drive I'm not clear. having a set of mirrored 1T makes more sense then a 2T on a single spindle. I'm using LVM so migrating LVs from oe drive to another is not a problem. HOWEVER .... What I am concerned about is initial reliability. My experience with new drives has never been good. It is one reason why I adopted LVM. The 1T drive I'm using is a replacement under warranty for a drive that threw an enormous amount of uncorrected bad blocks about a week into use. I still had the LVs on the older drive and had simply been mirroring. Upgrading from a 750G to a 1T meant that such problems showed up quickly. But a step from the 1T to a 3T might leave a lot of space never considered. Yes I know about the 'badblock' program. There are a few variants. But that isn't fast. I've used it on a 8G USB and it took a couple of days. OK that's USB not SATA speeds. But how long would it take to scan a 3T drive using the 'badblock' program? TOO-OO-OO-OO-OO long! A drives SMART capability should include some kind of scanning but I'm unclear as to how comprehensive it is. Can anyone fill me in on that? Googling I see a few suggested ideas but they don't seem to reassure me about this question of a 'new drive'. 'smartctl -h /dev/sda' just says "PASSED" which is nice but not detailed. 'smartctl --all /dev/sda' tells me a lot but doesn't seem to tell me what I'm looking for. I'm reluctant to try forcing a scan/test on my live drive. Is there a SMARTCTL command that does the equivalent of a 'badblocks' scan? -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Show replies by date

jdd

12 Apr 12 Apr

13:37

Le 12/04/2017 à 15:28, Anton Aylward a écrit :

...

My desktop's 1T drive is approaching it's supposed EOL.

there is no evidence other than make more backups professional servers a going ssd (even cheap providers like o2switch), don't know if this is to be a good solution jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Greg Freemyer

13:51

On Wed, Apr 12, 2017 at 9:28 AM, Anton Aylward <opensuse@antonaylward.com> wrote:

...

My desktop's 1T drive is approaching it's supposed EOL.

I realise that time was, the bathtub curve meant that if a drive worked it probably would keep on working, but modern economics seems to mean that a component dies the day after its warranty expires. So I'm apprehensive.

I see 2T and 3T drives no on sale for less than I paid for my 1T drive. Why I should consider them when I haven't filled my 1T drive I'm not clear. having a set of mirrored 1T makes more sense then a 2T on a single spindle.

I'm using LVM so migrating LVs from oe drive to another is not a problem.

HOWEVER ....

What I am concerned about is initial reliability.

My experience with new drives has never been good. It is one reason why I adopted LVM. The 1T drive I'm using is a replacement under warranty for a drive that threw an enormous amount of uncorrected bad blocks about a week into use. I still had the LVs on the older drive and had simply been mirroring. Upgrading from a 750G to a 1T meant that such problems showed up quickly.

But a step from the 1T to a 3T might leave a lot of space never considered.

Yes I know about the 'badblock' program. There are a few variants. But that isn't fast. I've used it on a 8G USB and it took a couple of days. OK that's USB not SATA speeds. But how long would it take to scan a 3T drive using the 'badblock' program?

TOO-OO-OO-OO-OO long!

A drives SMART capability should include some kind of scanning but I'm unclear as to how comprehensive it is. Can anyone fill me in on that?

Googling I see a few suggested ideas but they don't seem to reassure me about this question of a 'new drive'.

'smartctl -h /dev/sda' just says "PASSED" which is nice but not detailed. 'smartctl --all /dev/sda' tells me a lot but doesn't seem to tell me what I'm looking for.

I'm reluctant to try forcing a scan/test on my live drive. Is there a SMARTCTL command that does the equivalent of a 'badblocks' scan?

Sorry, But smartctl is the best tool out there and it will only report issues detected in the normal course of use. You need to access every sector if you want to know if the media is starting to fail. I would: smartctl --all > pre-scan.out dd if=/dev/sda of=/dev/null bs=10M conv=noerror smartctl --all > post-scan.out Then compare your pre- and post- results to see if any serious concerns were found. Concerns would include correctable media errors, etc. Not just total failures. As to speed, if you can get 4GB/minute read speeds, you can read a TB in about 4 hours. I get as fast as 8GB/minute on 3 1/2 inch 7200 RPM modern large desktop drives. So that's a TB of reads in 2 hours. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Anton Aylward

14:03

On 12/04/17 09:51 AM, Greg Freemyer wrote:

...

Sorry,

But smartctl is the best tool out there and it will only report issues detected in the normal course of use.

You need to access every sector if you want to know if the media is starting to fail. I would:

smartctl --all > pre-scan.out dd if=/dev/sda of=/dev/null bs=10M conv=noerror smartctl --all > post-scan.out

I take it that the "-all" is purely reporting, does not alter the disk in any way?

...

Then compare your pre- and post- results to see if any serious concerns were found.

Concerns would include correctable media errors, etc. Not just total failures.

ACK. IIR there's a limit to correctable errors in the size of the "look-aside" table and allocated reserved sectors. Or is it tracks?

...

As to speed, if you can get 4GB/minute read speeds, you can read a TB in about 4 hours.

I get as fast as 8GB/minute on 3 1/2 inch 7200 RPM modern large desktop drives. So that's a TB of reads in 2 hours.

WOW! I like that! -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Wols Lists

14:31

On 12/04/17 14:51, Greg Freemyer wrote:

...

Then compare your pre- and post- results to see if any serious concerns were found.

Concerns would include correctable media errors, etc. Not just total failures.

Bear in mind a drive that is IN SPEC is merely warranted to throw, on average, less than one UNcorrectable (soft) read error per 10TB read. That means (crude maths) that you have a 10% chance of an uncorrectable error on an allegedly healthy 1TB drive. (I'm ignoring that it's statistics, and that a decent drive should perform way better than spec ... :-) What you do about that is up to you - that's why mirroring is a good idea ALONG WITH REGULAR SCRUBS. Cheers, Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

13 Apr 13 Apr

18:19

On 2017-04-12 15:28, Anton Aylward wrote:

...

Yes I know about the 'badblock' program. There are a few variants. But that isn't fast. I've used it on a 8G USB and it took a couple of days. OK that's USB not SATA speeds. But how long would it take to scan a 3T drive using the 'badblock' program?

TOO-OO-OO-OO-OO long!

I just run it on a 1.8 TiB (2 TB) disk, and it took 223m5.110s, without umounting it (ie, nominally in use).

...

I'm reluctant to try forcing a scan/test on my live drive. Is there a SMARTCTL command that does the equivalent of a 'badblocks' scan?

The long test. But it usually stops at the first error. Previous to running the badblocks programs, I had this report on the log: <3.2> 2017-04-13 15:53:23 Telcontar smartd 1509 - - Device: /dev/sdd [SAT], 8 Currently unreadable (pending) sectors <3.2> 2017-04-13 15:53:23 Telcontar smartd 1509 - - Device: /dev/sdd [SAT], 8 Offline uncorrectable sectors Now I have: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 I was trying to find out the location (the partition) with errors and overwrite them, but they have disappeared from view. Not for the first time, I suspect. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

Dave Plater

14 Apr 14 Apr

13:52

On 12/04/2017 15:28, Anton Aylward wrote:

...

'smartctl -h /dev/sda' just says "PASSED" which is nice but not detailed. 'smartctl --all /dev/sda' tells me a lot but doesn't seem to tell me what I'm looking for. smartctl -t long /dev/sdX is the command that tests all sectors, if it reports an error send it back. I've successfully extended the life of drives with uncorrectable sectors by using the hdparm --write-sector command on the sector output in smartctl -l selftest.

Best regards Dave P -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

13:59

On 2017-04-14 15:52, Dave Plater wrote:

...

On 12/04/2017 15:28, Anton Aylward wrote:

...
'smartctl -h /dev/sda' just says "PASSED" which is nice but not detailed. 'smartctl --all /dev/sda' tells me a lot but doesn't seem to tell me what I'm looking for. smartctl -t long /dev/sdX is the command that tests all sectors, if it reports an error send it back. I've successfully extended the life of drives with uncorrectable sectors by using the hdparm --write-sector command on the sector output in smartctl -l selftest.

How do you get the list of bad sectors from smartctl? -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

Dave Plater

15:15

On 14/04/2017 15:59, Carlos E. R. wrote:

...

On 2017-04-14 15:52, Dave Plater wrote:

...
On 12/04/2017 15:28, Anton Aylward wrote:

...
'smartctl -h /dev/sda' just says "PASSED" which is nice but not detailed. 'smartctl --all /dev/sda' tells me a lot but doesn't seem to tell me what I'm looking for. smartctl -t long /dev/sdX is the command that tests all sectors, if it reports an error send it back. I've successfully extended the life of drives with uncorrectable sectors by using the hdparm --write-sector command on the sector output in smartctl -l selftest.

How do you get the list of bad sectors from smartctl?

using "smartctl -l selftest" it gives you the first failed sector. It's a long process if you have many bad sectors. In my case I only had four that I suspect came from a defective sata cable. You don't get a list. Dave P -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Greg Freemyer

15:33

On Fri, Apr 14, 2017 at 11:15 AM, Dave Plater <dplater.list@gmail.com> wrote:

...

using "smartctl -l selftest" it gives you the first failed sector. It's a long process if you have many bad sectors. In my case I only had four that I suspect came from a defective sata cable. You don't get a list.

?? How does a bad cable cause a bad sector? The drive electronics themselves take 512 bytes (or whatever the sector size is) of data and generate an appropriate header/footer incuding the ECC info. A correctable read error means the data was only slightly corrupt and the ECC data allowed the data to be corrected. A bad sector means the contents of the sector are corrupted such that the ECC info isn't sufficient to correct the sector data. The sata cable is not involved in the ECC generation process, nor the low level read/verify process. Greg -- Greg Freemyer -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Dave Plater

16 Apr 16 Apr

08:54

On 14/04/2017 17:33, Greg Freemyer wrote:

...

On Fri, Apr 14, 2017 at 11:15 AM, Dave Plater <dplater.list@gmail.com> wrote:

...
using "smartctl -l selftest" it gives you the first failed sector. It's a long process if you have many bad sectors. In my case I only had four that I suspect came from a defective sata cable. You don't get a list.

?? How does a bad cable cause a bad sector?

The drive electronics themselves take 512 bytes (or whatever the sector size is) of data and generate an appropriate header/footer incuding the ECC info.

A correctable read error means the data was only slightly corrupt and the ECC data allowed the data to be corrected.

A bad sector means the contents of the sector are corrupted such that the ECC info isn't sufficient to correct the sector data.

The sata cable is not involved in the ECC generation process, nor the low level read/verify process.

Greg

-- Greg Freemyer

Correction, it was the 5V pin on the connector to the sata power cable that was loose. The sata cable was also defective. Dave P -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Greg Freemyer

17 Apr 17 Apr

16:25

On Sun, Apr 16, 2017 at 4:54 AM, Dave Plater <dplater.list@gmail.com> wrote:

...

On 14/04/2017 17:33, Greg Freemyer wrote:

...
On Fri, Apr 14, 2017 at 11:15 AM, Dave Plater <dplater.list@gmail.com> wrote:

...
using "smartctl -l selftest" it gives you the first failed sector. It's a long process if you have many bad sectors. In my case I only had four that I suspect came from a defective sata cable. You don't get a list.

?? How does a bad cable cause a bad sector?

The drive electronics themselves take 512 bytes (or whatever the sector size is) of data and generate an appropriate header/footer incuding the ECC info.

A correctable read error means the data was only slightly corrupt and the ECC data allowed the data to be corrected.

A bad sector means the contents of the sector are corrupted such that the ECC info isn't sufficient to correct the sector data.

The sata cable is not involved in the ECC generation process, nor the low level read/verify process.

Greg

-- Greg Freemyer

Correction, it was the 5V pin on the connector to the sata power cable that was loose. The sata cable was also defective. Dave P

Thanks Dave, A bad power cable could definitely cause corrupted sectors. I was thinking of the data cable. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

14 Apr 14 Apr

17:14

On 2017-04-14 17:15, Dave Plater wrote:

...

On 14/04/2017 15:59, Carlos E. R. wrote:

...
On 2017-04-14 15:52, Dave Plater wrote:

...
On 12/04/2017 15:28, Anton Aylward wrote:

...
'smartctl -h /dev/sda' just says "PASSED" which is nice but not detailed. 'smartctl --all /dev/sda' tells me a lot but doesn't seem to tell me what I'm looking for. smartctl -t long /dev/sdX is the command that tests all sectors, if it reports an error send it back. I've successfully extended the life of drives with uncorrectable sectors by using the hdparm --write-sector command on the sector output in smartctl -l selftest.

How do you get the list of bad sectors from smartctl?

using "smartctl -l selftest" it gives you the first failed sector. It's a long process if you have many bad sectors. In my case I only had four that I suspect came from a defective sata cable. You don't get a list.

Ah. But not always. I have one disk that doesn't, so I run badblocks on it, which found none, and then smartctl also found none. I thought that perhaps there was another place where smartctl listed them. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

2800

Age (days ago)

2805

Last active (days ago)

List overview

Download

12 comments

6 participants

participants (6)

Anton Aylward
Carlos E. R.
Dave Plater
Greg Freemyer
jdd
Wols Lists