[opensuse] Hard Disk Failing
Hi all, Some days ago I began to receive the following message: "Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sda, 1 currently unreadable (pending) sector" I don't know what it refers to but I have run a Live Gparted CD to check the entire hard disk (200 GB) and some inodes where repaired but the message is still showing up when I restart the computer. Any idea on what to do? Thanks, -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Nov 6, 2007 8:15 PM, Fernando Costa
Hi all,
Some days ago I began to receive the following message: "Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sda, 1 currently unreadable (pending) sector" I don't know what it refers to but I have run a Live Gparted CD to check the entire hard disk (200 GB) and some inodes where repaired but the message is still showing up when I restart the computer. Any idea on what to do?
Backup and replace ASAP. If you want to confirm what SMART is reporting, use the manufacturers utility to check the drive. You can download it from their web site, or you can find it on Ultimate Boot CD. Cheers -- Svetoslav Milenov (Sunny) Even the most advanced equipment in the hands of the ignorant is just a pile of scrap. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Fernando Costa wrote:
Hi all,
Some days ago I began to receive the following message: "Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sda, 1 currently unreadable (pending) sector" I don't know what it refers to but I have run a Live Gparted CD to check the entire hard disk (200 GB) and some inodes where repaired but the message is still showing up when I restart the computer. Any idea on what to do?
Thanks,
Don't know if this helps, but a few years ago I had a computer that was saying the same thing. I couldn't find anything wrong with the drive so I turned off smart and used it another couple years before it finally failed. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
At 08:53 PM 11/6/2007 -0600, Billie Walsh wrote:
Fernando Costa wrote:
Hi all,
Some days ago I began to receive the following message: "Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sda, 1 currently unreadable (pending) sector" I don't know what it refers to but I have run a Live Gparted CD to check the entire hard disk (200 GB) and some inodes where repaired but the message is still showing up when I restart the computer. Any idea on what to do?
Don't know if this helps, but a few years ago I had a computer that was saying the same thing. I couldn't find anything wrong with the drive so I turned off smart and used it another couple years before it finally failed.
Just because you got lucky, doesn't mean he will. Why risk someone else's data? If SMART says he has sectors that can't be read and are pending relocation, he should dig into the drive and find out what's happening. SMART runs _on_the_drive_itself_ and is closest to its function on the lowest level. No external program can get a better read on the health of the drive than SMART, which the drive runs on its own processor. What most people think of as SMART (smartmontools, smartctl) is just a program to send commands to the SMART process on the drive and read back results. SMART watches a dozen parameters, like how long did it take to spin up, what's the temperature on the drive, are bad sectors being relocated and at what rate. Together, these spell out just exactly how healthy the drive is. The BIOS configuration option that says "Enable SMART" really only does two things: It usually enables the drive to respond to SMART, and it runs the most basic SMART health check as a pass/fail. This is the equivalent of "smartctl -s on /dev/sda" followed by "smartctl -H /dev/sda" (if your drive is /dev/sda). He should run the -H option for a quick go/no go look at the drive before you dig deeper. This guy's most urgent need is to back up his drive and look for a good deal on a replacement. But before actually placing that order, get and run the smartMonTools package. Pending relocate sectors isn't fatal, but we don't know if he's run any other SMART tests. He should run a long test: smartctl -T long /dev/sda This doesn't impact performance (much) and doesn't take the drive offline. Read the test log before you start the test, better yet, print it. Then read it again after and compare-- the drive doesn't know what time/date it is, it only logs in power-on hours. Read the logs like this: smartctl -l error /dev/sda # read the error log smartctl -l selftest /dev/sda # read the results of tests Last, use smartctl -h for the other options. The true strength of the SmartMonTools package is that smartd will report all changes in performance to your messages file, which you can analyze later. I even run it on my Winders boxes. Don't ask, not my idea to have them. HTH -Tom ----- 356. [Philosophy] The darkness does die, as the curtain is drawn, and somebody's eyes must meet the dawn. --Bob Dylan --... ...-- -.. . -. ----. --.- --.- -... tpeters@nospam.mixcom.com (remove "nospam") N9QQB (amateur radio) "HEY YOU" (loud shouting) WEB: http://www.mixweb.com/tpeters 43° 7' 17.2" N by 88° 6' 28.9" W, Elevation 815', Grid Square EN53wc WAN/LAN/Telcom Analyst, Tech Writer, MCP, CCNA, Registered Linux User 385531 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 11/06/2007 Tom Peters wrote: Billie wrote:
Don't know if this helps, but a few years ago I had a computer that was saying the same thing. I couldn't find anything wrong with the drive so I turned off smart and used it another couple years before it finally failed.
Just because you got lucky, doesn't mean he will. Why risk someone else's data? If SMART says he has sectors that can't be read and are pending relocation, he should dig into the drive and find out what's happening.
I never said he should just ignore it. I checked my drive before I did anything. I think I said that in my post above. However, sometimes [ SOMETIMES ] SMART isn't quite as smart as it thinks it is. AFTER I checked my drive and found nothing wrong I turned it off because I got tired of the nagging at start up. In my experience, somewhat limited I admit, I've had more drives just quit without warning than with. I used to burn up drives in about two to three years because I VERY seldom turn off my computer. Some of them made a bit of noise as a warning before they went tits-up. Some didn't, just dies on the spot. I think they have made improvements in later years as they seem to be lasting longer. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Billie Walsh wrote:
On 11/06/2007 Tom Peters wrote: Billie wrote:
Don't know if this helps, but a few years ago I had a computer that was saying the same thing. I couldn't find anything wrong with the drive so I turned off smart and used it another couple years before it finally failed.
Just because you got lucky, doesn't mean he will. Why risk someone else's data? If SMART says he has sectors that can't be read and are pending relocation, he should dig into the drive and find out what's happening.
I never said he should just ignore it. I checked my drive before I did anything. I think I said that in my post above. However, sometimes [ SOMETIMES ] SMART isn't quite as smart as it thinks it is. AFTER I checked my drive and found nothing wrong I turned it off because I got tired of the nagging at start up.
In a very unfortunate incident I used a drive for which SMART was reporting errors. I low level reformatted the drive, run SMART's long test and got no errors. Installed the OS and had the drive fail shortly afterwards. The bottom line is that if the self diagnostic tool in the drive says it is failing, replace it. If you don't you might get lucky o or you might not. The safety of your data is up to you. -- Rafael -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Billie Walsh wrote:
Fernando Costa wrote:
Hi all,
Some days ago I began to receive the following message: "Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sda, 1 currently unreadable (pending) sector" I don't know what it refers to but I have run a Live Gparted CD to check the entire hard disk (200 GB) and some inodes where repaired but the message is still showing up when I restart the computer. Any idea on what to do?
Thanks,
Don't know if this helps, but a few years ago I had a computer that was saying the same thing. I couldn't find anything wrong with the drive so I turned off smart and used it another couple years before it finally failed.
One thing to bear in mind, is that drives have spare sectors, which get used as others fail. The warning is to tell you that the drive is well on it's way to failing and should be replaced ASAP. You were lucky that it didn't fail sooner. What you did, is comparable to disabling the engine light on a car, rather than fixing what's causing it to turn on. -- Use OpenOffice.org http://www.openoffice.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 11/07/2007 James Knott wrote:
One thing to bear in mind, is that drives have spare sectors, which get used as others fail. The warning is to tell you that the drive is well on it's way to failing and should be replaced ASAP. You were lucky that it didn't fail sooner. What you did, is comparable to disabling the engine light on a car, rather than fixing what's causing it to turn on.
There was nothing wrong with the drive. There was something in SMART that was wrong. I've seen SMART say that a brand new drive is failing. SMART is nothing like the engine light on a car. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Billie Walsh wrote:
On 11/07/2007 James Knott wrote:
One thing to bear in mind, is that drives have spare sectors, which get used as others fail. The warning is to tell you that the drive is well on it's way to failing and should be replaced ASAP. You were lucky that it didn't fail sooner. What you did, is comparable to disabling the engine light on a car, rather than fixing what's causing it to turn on.
There was nothing wrong with the drive. There was something in SMART that was wrong. I've seen SMART say that a brand new drive is failing. SMART is nothing like the engine light on a car.
On the basis that SMART was wrong *once*, are you willing to bet YOUR DATA that his old drive has NOT run out of internal spare sectors? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 07 November 2007 16:55:13 Aaron Kulkis wrote:
Billie Walsh wrote:
On 11/07/2007 James Knott wrote:
One thing to bear in mind, is that drives have spare sectors, which get used as others fail. The warning is to tell you that the drive is well on it's way to failing and should be replaced ASAP. You were lucky that it didn't fail sooner. What you did, is comparable to disabling the engine light on a car, rather than fixing what's causing it to turn on.
There was nothing wrong with the drive. There was something in SMART that was wrong. I've seen SMART say that a brand new drive is failing. SMART is nothing like the engine light on a car.
On the basis that SMART was wrong *once*, are you willing to bet YOUR DATA that his old drive has NOT run out of internal spare sectors?
I thought I'd go back and look up a discussion on Security Now about this. SMART, or the Smart Monitoring Analysis and Reporting Technology is not actually all that smart. Not only does it occasionally report drives that are not failing, it often fails to report a drive that IS failing. When drives went to the IDE interface standard, the controller was moved onto the drive, the drive started being smart, and had its own microprocessor on it. And Compaq, who was the big leader in the clone market then, got Seagate, Quantum, and Connor Peripherals to give us a way to know what’s going on behind the scenes in the drive. In order to get the manufacturers to agree, the SMART specification had to be left very loose. So what we’ve ended up with, even today, 15 years later, is a specification which is sorrowfully weak. And a Google paper that looked at over a hundred thousand failed drives said they often saw seeing no uniform meaning behind many of these SMART parameters across a large install base of hard drives. They also say a lot of the failed drives had no SMART errors at all. The drive can die spontaneously while SMART is completely happy and sees nothing going wrong. At the same time there are drives which look like they’re on their last throes from a standpoint of the SMART data that just keep on going for years. It said in the study, "After our initial attempts to derive such models yield relatively unimpressive results, we turn to the question of what might be the upper bound of the accuracy of any model based solely on SMART parameters. Our results are surprising, if not somewhat disappointing. Out of all failed drives, over 50 percent of them have no count in any of the four strongest SMART signals, namely scan errors, relocation count, offline relocation, and probational count. In other words, models based only on those signals can never predict more than half of the failed drives.” So essentially what Google found is that many drives were failing, more than half of theirs were failing where nothing showed up at all in the SMART subsystem; and also that exactly the reverse was happening, is that SMART was showing things where drives never failed. There were things they found, for example, when they would ask the SMART system to scan the drive, if an error was found during that scanning, the drive was 39 times more likely to fail in the next 60 days than all other drives. Except that it turns out 39 times more likely wasn’t predictive enough to say, okay, we should replace the drive, because it turns out that there were lots of drives that had scan errors that never failed. In other words, don't count on SMART to tell you whether or not to replace a drive. Personally I use Spinrite for that, but it's not free, and there is no demo version. I find it works, and I have no connection to GRC other than as a satisfied user. There's a whole discussion on the Security Now Podcast about this, which you can read at http://www.grc.com/sn/SN-081.htm. -- Bob Smits bob@rsmits.ca A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 11/07/2007 Aaron Kulkis wrote:
On the basis that SMART was wrong *once*, are you willing to bet YOUR DATA that his old drive has NOT run out of internal spare sectors?
I don't think that I EVER suggested that he should just ignore it. I just suggested that there was a possibility that SMART could be wrong. It has been several times in my own experience. I don't think I EVER suggested that he shouldn't test his drive before making any decision. I don't have super important data to lose but I sure would test a drive BEFORE I just discarded it. Obviously I'm not as rich as most folks. In my ORIGINAL mail I said that SMART said that I had a drive that was failing. There was nothing wrong with the drive so I turned off SMART and used the drive for another couple years. In other words I tested the drive. It was fine. SMART was wrong. I turned off SMART. SMART CAN be wrong. Maybe not often. Maybe not every time. But it CAN be wrong. SO, JUST GET OFF MY ASS! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
* Billie Walsh
On 11/07/2007 Aaron Kulkis wrote:
On the basis that SMART was wrong *once*, are you willing to bet YOUR DATA that his old drive has NOT run out of internal spare sectors?
[...]
SMART CAN be wrong. Maybe not often. Maybe not every time. But it CAN be wrong.
SO, JUST GET OFF MY ASS!
BE above it, don't let 'm get under yer skin :^) you made *his* day and ruined yours :^( - -- Patrick Shanahan Plainfield, Indiana, USA HOG # US1244711 http://wahoo.no-ip.org Photo Album: http://wahoo.no-ip.org/gallery2 Registered Linux User #207535 @ http://counter.li.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn4472 (GNU/Linux) iD4DBQFHM7eEClSjbQz1U5oRAld/AJ0Xfzb9srcuV5dtqqoR2Wx2yrAGegCXbbyt 6CbMjEizMJo/fJtEC4CPLA== =/uG/ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 11/08/2007 Patrick Shanahan wrote:
you made *his* day and ruined yours :^(
He couldn't ruin my day if he tried. I woke up breathing this morning. Everything after that is gravey. Hope to do the same thing tomorrow morning. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Billie Walsh wrote:
On 11/08/2007 Patrick Shanahan wrote:
you made *his* day and ruined yours :^(
He couldn't ruin my day if he tried. I woke up breathing this morning. Everything after that is gravey. Hope to do the same thing tomorrow morning.
After a year in Baghdad, and having part of my barracks blown up in a rocket attack, I feel the same way. Every day that I'm alive is a good day. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Billie Walsh wrote:
On 11/07/2007 Aaron Kulkis wrote:
On the basis that SMART was wrong *once*, are you willing to bet YOUR DATA that his old drive has NOT run out of internal spare sectors?
I don't think that I EVER suggested that he should just ignore it. I just suggested that there was a possibility that SMART could be wrong. It has been several times in my own experience.
I don't think I EVER suggested that he shouldn't test his drive before making any decision. I don't have super important data to lose but I sure would test a drive BEFORE I just discarded it. Obviously I'm not as rich as most folks.
In my ORIGINAL mail I said that SMART said that I had a drive that was failing. There was nothing wrong with the drive so I turned off SMART and used the drive for another couple years.
In other words I tested the drive. It was fine. SMART was wrong. I turned off SMART.
SMART CAN be wrong. Maybe not often. Maybe not every time. But it CAN be wrong.
I'll quote from your original reply: Billie Walsh wrote: < On 11/07/2007 James Knott wrote: << < One thing to bear in mind, is that drives have spare sectors, which get
< < used as others fail. The warning is to tell you that the drive is well << < on it's way to failing and should be replaced ASAP. You were lucky that << > it didn't fail sooner. What you did, is comparable to disabling the << < engine light on a car, rather than fixing what's causing it to turn on. < < There was nothing wrong with the drive. There was something in SMART < that was wrong. I've seen SMART say that a brand new drive is failing. < SMART is nothing like the engine light on a car.
Doesn't sound like you're told him that SMART *MIGHT* be giving a bogus report... you told him in no uncertain terms that without a doubt, his drive is 100% good, and that SMART is 100% wrong.
SO, JUST GET OFF MY ASS!
When you stop being a jerk, I gladly will. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Billie Walsh wrote:
On 11/07/2007 James Knott wrote:
One thing to bear in mind, is that drives have spare sectors, which get used as others fail. The warning is to tell you that the drive is well on it's way to failing and should be replaced ASAP. You were lucky that it didn't fail sooner. What you did, is comparable to disabling the engine light on a car, rather than fixing what's causing it to turn on.
There was nothing wrong with the drive. There was something in SMART that was wrong. I've seen SMART say that a brand new drive is failing. SMART is nothing like the engine light on a car.
No, it's more like one of these: http://www.jcwhitney.com/webapp/wcs/stores/servlet/Product?PID=493706&Pr=p_Product.CATENTRY_ID%3A2011215&catalogId=10101&productId=2011215&langId=-1&AID=10273849&TID=101&storeId=10101 Or perhaps they are like SMART. I'm not sure which came first. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 06 November 2007 18:15:20 Fernando Costa wrote:
Hi all,
Some days ago I began to receive the following message: "Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sda, 1 currently unreadable (pending) sector" I don't know what it refers to but I have run a Live Gparted CD to check the entire hard disk (200 GB) and some inodes where repaired but the message is still showing up when I restart the computer. Any idea on what to do?
Thanks,
Go to www.grc.com and get Spinrite, a low level formatting, repair and maintenance utility. It works for any kind of file system, including Linux ones. I've used it to recover drives that were no longer readable. -- Bob Smits bob@rsmits.ca A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 06 November 2007 22:05, Robert Smits wrote:
...
Go to www.grc.com and get Spinrite, a low level formatting, repair and maintenance utility. It works for any kind of file system, including Linux ones. I've used it to recover drives that were no longer readable.
From http://www.grc.com/cs/prepurch.htm: -==--==--==--==--==--==--==--==--==--==--==- SpinRite 6.0 is NTFS, LINUX & TIVO Compatible! -==--==--==--==--==--==--==--==--==--==--==- TiVo?? I know it's Linux inside, and I've been "concerned" what will happen when that drive (which is never idle) will give out, but I'm not sure I really want to know how bad it is. Mine's been running almost non-stop for almost 2 1/2 year now... Every time I see some bizarre anomalies in the video or audio stream, I have to wonder whether it's the TiVo drive or was corrupt data coming through the digital cable network. By the way, $89 is a bit steep for a single-purpose utility, don't you think?
-- Bob Smits bob@rsmits.ca
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday 08 November 2007 15:38:40 Randall R Schulz wrote:
On Tuesday 06 November 2007 22:05, Robert Smits wrote:
...
Go to www.grc.com and get Spinrite, a low level formatting, repair and maintenance utility. It works for any kind of file system, including Linux ones. I've used it to recover drives that were no longer readable.
-==--==--==--==--==--==--==--==--==--==--==- SpinRite 6.0 is NTFS, LINUX & TIVO Compatible! -==--==--==--==--==--==--==--==--==--==--==-
TiVo?? I know it's Linux inside, and I've been "concerned" what will happen when that drive (which is never idle) will give out, but I'm not sure I really want to know how bad it is. Mine's been running almost non-stop for almost 2 1/2 year now... Every time I see some bizarre anomalies in the video or audio stream, I have to wonder whether it's the TiVo drive or was corrupt data coming through the digital cable network.
By the way, $89 is a bit steep for a single-purpose utility, don't you think?
I guess it depends on what recovering your data is worth. I don't remember what I paid, but I think it was less, and I got a previous version, which was updated to 6 at no charge. I haven't had a drive fail since, but I do keep all my fingers crossed ;-) ;-) -- Bob Smits bob@rsmits.ca A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Fernando Costa wrote:
Hi all,
Some days ago I began to receive the following message: "Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sda, 1 currently unreadable (pending) sector" I don't know what it refers to but I have run a Live Gparted CD to check the entire hard disk (200 GB) and some inodes where repaired but the message is still showing up when I restart the computer. Any idea on what to do?
Thanks,
Go ahead and get another drive, copy everything to it, and be ready for the failure. Use your current drive as long as you like, but be aware that anything not backed up will be lost eventually. -ED- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 06 November 2007 18:15, Fernando Costa wrote:
Hi all,
Some days ago I began to receive the following message: "Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sda, 1 currently unreadable (pending) sector" ...
I would like to get some authoritative and definitive information about the specific meaning of this warning. I'm a bit skeptical about the dire warnings that these messages mean the drive is near the end of its useful life. I have a drive that has only been in operation a few months (and it's been a few months of very light use, at that) that is giving me a couple of similar error messages: Device: /dev/sda, 2 Currently unreadable (pending) sectors Device: /dev/sda, 2 Offline uncorrectable sectors All the other SMART output is routine bookkeeping or temperature change notifications. I have no other overt indications of problems with this drive. If it's relevant, this is an excerpt from the drive's "hwinfo" output: Model: "WDC WD1500ADFD-0" Vendor: "WDC" Device: "WD1500ADFD-0" Revision: "20.0" Serial ID: "WD-WMAP41210940" Driver: "ata_piix", "sd" Driver Modules: "ata_piix" Device File: /dev/sda Geometry (Logical): CHS 18241/255/63 Size: 293046768 sectors a 512 bytes Geometry (BIOS EDD): CHS 290721/16/63 Size (BIOS EDD): 293046768 sectors Geometry (BIOS Legacy): CHS 1024/255/63 Config Status: cfg=no, avail=yes, need=no, active=unknown Attached to: #17 (IDE interface) What precisely do these SMART diagnostics mean? How can I get a more detailed diagnosis or possibly remap the defective sectors? Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Nov 7, 2007 5:15 PM, Randall R Schulz
I would like to get some authoritative and definitive information about the specific meaning of this warning. I'm a bit skeptical about the dire warnings that these messages mean the drive is near the end of its useful life.
I have a drive that has only been in operation a few months (and it's been a few months of very light use, at that) that is giving me a couple of similar error messages:
Device: /dev/sda, 2 Currently unreadable (pending) sectors Device: /dev/sda, 2 Offline uncorrectable sectors
All the other SMART output is routine bookkeeping or temperature change notifications. I have no other overt indications of problems with this drive. If it's relevant, this is an excerpt from the drive's "hwinfo" output:
What precisely do these SMART diagnostics mean? How can I get a more detailed diagnosis or possibly remap the defective sectors?
Randall, try smartctl -t long /dev/sda This will start a complete test of the drive (it may take hours). After the test is finished (you can check from time to time with smartctl -l selftest). The last command will display the persentage of the test finished, or the test results. Note that it shows the results for the last 21 tests. Check man smartctl for more info. Cheers -- Svetoslav Milenov (Sunny) Even the most advanced equipment in the hands of the ignorant is just a pile of scrap. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 07 November 2007 15:34, Sunny wrote:
...
Randall, try smartctl -t long /dev/sda
This will start a complete test of the drive (it may take hours). After the test is finished (you can check from time to time with smartctl -l selftest). The last command will display the persentage of the test finished, or the test results. Note that it shows the results for the last 21 tests.
Would that it were so. The output from "smartctl -l /dev/sda" hasn't changed since I initiated the test, which was now about than 3:45 hours ago: -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- % smartctl -t long /dev/sda smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 72 minutes for test to complete. Test will complete after Wed Nov 7 17:35:44 2007 -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- Immediately thereafter, I did this: -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- % smartctl -l selftest /dev/sda smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 4762 261202 -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- The output hasn't changed since then: -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- % date Wed Nov 7 19:57:50 PST 2007 % smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 4762 261202 -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- If SMART itself doesn't work, how can I trust the diagnostics it produces? Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Nov 7, 2007 9:59 PM, Randall R Schulz
-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- % date Wed Nov 7 19:57:50 PST 2007
% smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 4762 261202 -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
If SMART itself doesn't work, how can I trust the diagnostics it produces?
For what I understand - the test failed, and no previous tests have been ever run. You should be able to get the test results with smartctl -l error. Also, you may use startctl -t long again, and try -l selftest again, just to see if it updates the tests count. -- Svetoslav Milenov (Sunny) Even the most advanced equipment in the hands of the ignorant is just a pile of scrap. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 07 November 2007 20:40, Sunny wrote:
...
For what I understand - the test failed, and no previous tests have been ever run. You should be able to get the test results with smartctl -l error.
Also, you may use startctl -t long again, and try -l selftest again, just to see if it updates the tests count.
After a second test (this time with the drive unmounted from the file system), the results are now: -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- % smartctl -t long /dev/sda smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 72 minutes for test to complete. Test will complete after Wed Nov 7 21:27:44 2007 % smartctl -l selftest /dev/sda smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 4766 261202 # 2 Extended offline Completed: read failure 90% 4762 261202 % smartctl -l error /dev/sda smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Error Log Version: 1 No Errors Logged -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- I'm not finding this very clear... Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Nov 8, 2007 12:02 AM, Randall R Schulz
% smartctl -l error /dev/sda smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION === SMART Error Log Version: 1 No Errors Logged -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
I'm not finding this very clear...
Neither am I. Maybe it's time to consult the proper mailing list :) Sorry I could not help more ... Cheers -- Svetoslav Milenov (Sunny) Even the most advanced equipment in the hands of the ignorant is just a pile of scrap. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 07 November 2007 22:57, Sunny wrote:
On Nov 8, 2007 12:02 AM, Randall R Schulz
wrote: ...
I'm not finding this very clear...
Neither am I. Maybe it's time to consult the proper mailing list :)
Sorry I could not help more ...
No need to apologize. I'll admit I've been a bit lazy about this. When I first saw the messages a week or so ago, I went looking for information, but what I found was kind of obscure, too, and I have been inclined to think that 'cause the drive is so new (in terms of hours of operation and lightness of use over those hours) that the errors must be either minor or spurious. I'll check it out further.
Cheers
-- Svetoslav Milenov
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Randall R Schulz wrote:
On Wednesday 07 November 2007 20:40, Sunny wrote:
...
For what I understand - the test failed, and no previous tests have been ever run. You should be able to get the test results with smartctl -l error.
Also, you may use startctl -t long again, and try -l selftest again, just to see if it updates the tests count.
After a second test (this time with the drive unmounted from the file system), the results are now:
-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- % smartctl -t long /dev/sda smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 72 minutes for test to complete. Test will complete after Wed Nov 7 21:27:44 2007
% smartctl -l selftest /dev/sda smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 4766 261202 # 2 Extended offline Completed: read failure 90% 4762 261202
% smartctl -l error /dev/sda smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION === SMART Error Log Version: 1 No Errors Logged -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
I'm not finding this very clear...
Randall Schulz
You could try the -a option which will give a description of all messages currently in log (and a lot of other info so it wise to put the output through less). I would interpret the above as saying the test has completed and has detected the first read failure at LBA 261202... The - -i option will give you a better idea idea of the S.M.A.R.T capabilities of the drive, (it is possible that it has no capability to retain a log).... - -- ============================================================================== I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. Bjarne Stroustrup ============================================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHMtrrasN0sSnLmgIRAvU5AKCdrTKvWRzMrnIgv8RDUjRu437r+ACfd/Vb 5Nn1sbA9RntLvuGk0k466O4= =6qIa -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday 08 November 2007 01:46, G T Smith wrote:
Randall R Schulz wrote:
...
I'm not finding this very clear...
Randall Schulz
You could try the -a option which will give a description of all messages currently in log (and a lot of other info so it wise to put the output through less). I would interpret the above as saying the test has completed and has detected the first read failure at LBA 261202... The -i option will give you a better idea idea of the S.M.A.R.T capabilities of the drive, (it is possible that it has no capability to retain a log)....
It's only 86 lines, but it confuses me, too, at this point. I've attached the output of "smartctl -a /dev/sda". If you or anybody can tell me something meaningful, feel free. Randall Schulz
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Randall R Schulz wrote:
On Thursday 08 November 2007 01:46, G T Smith wrote:
...
I'm not finding this very clear...
Randall Schulz You could try the -a option which will give a description of all messages currently in log (and a lot of other info so it wise to put
Randall R Schulz wrote: the output through less). I would interpret the above as saying the test has completed and has detected the first read failure at LBA 261202... The -i option will give you a better idea idea of the S.M.A.R.T capabilities of the drive, (it is possible that it has no capability to retain a log)....
It's only 86 lines, but it confuses me, too, at this point.
I've attached the output of "smartctl -a /dev/sda". If you or anybody can tell me something meaningful, feel free.
Randall Schulz
What I am seeing here would not worry me greatly, there are a couple of things that puzzle me though... But I would check whether WD have any technical documents on this unit. I assume you have smartd setup to do regular tests on the drive and forward mail as well as log results... I have noticed that you will get transient reports from time to time, but most are indicators are notifying of temp changes and the like. Robert Smits information rather confirms what I have suspected for some time about how one should assess a S.M.A.R.T report, unfortunately Robert did not give a link for the paper he referred to. I would be interested to have a look at it ... - -- ============================================================================== I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. Bjarne Stroustrup ============================================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHNCR/asN0sSnLmgIRAtBLAJ9DzF/2f6y6YHVI1oOZ2Q/5Ojni0ACgqN5D EswOTcvfawNrNPGU40EcX9w= =bSqn -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday 09 November 2007 01:12, G T Smith wrote:
Randall R Schulz wrote:
... It's only 86 lines, but it confuses me, too, at this point.
I've attached the output of "smartctl -a /dev/sda". If you or anybody can tell me something meaningful, feel free.
Randall Schulz
What I am seeing here would not worry me greatly, there are a couple of things that puzzle me though... But I would check whether WD have any technical documents on this unit.
OK. I'll see what I can find. By the way, this is the drive: http://www.wdc.com/en/products/products.asp?DriveID=189
I assume you have smartd setup to do regular tests on the drive and forward mail as well as log results... I have noticed that you will get transient reports from time to time, but most are indicators are notifying of temp changes and the like.
Yes. There seems to be some other snafu that's causing the mail delivery to sometimes fail (according to other diagnostics in /var/log/messages), but something's getting through, 'cause I get alerts in KDE. In fact, that's what first brought it to my attention. Since then, I've been monitoring the related entries in /var/log/messages.
Robert Smits information rather confirms what I have suspected for some time about how one should assess a S.M.A.R.T report, unfortunately Robert did not give a link for the paper he referred to. I would be interested to have a look at it ...
Yes, Robert, if you're still reading this thread, if you could point us to that paper, we'd appreciate it. Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday 09 November 2007 01:12:31 G T Smith wrote:
Robert Smits information rather confirms what I have suspected for some time about how one should assess a S.M.A.R.T report, unfortunately Robert did not give a link for the paper he referred to. I would be interested to have a look at it ...
Happy to oblige..... http://209.85.163.132/papers/disk_failures.pdf -- Bob Smits bob@rsmits.ca A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday 09 November 2007 09:43, Robert Smits wrote:
On Friday 09 November 2007 01:12:31 G T Smith wrote:
Robert Smits information rather confirms what I have suspected for some time about how one should assess a S.M.A.R.T report, unfortunately Robert did not give a link for the paper he referred to. I would be interested to have a look at it ...
Happy to oblige.....
Thanks. Oddly enough, when I went to download that into my publications directory, I discovered I already had a copy that I downloaded back in February and which is byte-for-byte identical.
-- Bob Smits bob@rsmits.ca
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
On Friday 09 November 2007 09:43, Robert Smits wrote:
On Friday 09 November 2007 01:12:31 G T Smith wrote:
Robert Smits information rather confirms what I have suspected for some time about how one should assess a S.M.A.R.T report, unfortunately Robert did not give a link for the paper he referred to. I would be interested to have a look at it ... Happy to oblige.....
Thanks.
Oddly enough, when I went to download that into my publications directory, I discovered I already had a copy that I downloaded back in February and which is byte-for-byte identical.
While this study is great, one should not forget that the google usage environment of hundreds of thousands disks is not directly comparable to what most people do at work or at home. I.e. most people do not work in air-conditioned data centers and most desktops do not run 24x7. So while the google paper is certainly informative and a rare beast in regard to the observation of a very large population of commodity harddisks, I would not dare to use any of it's conclusions lightly for my home usage pattern. regards Eberhard -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday 09 November 2007 10:24, Eberhard Roloff wrote:
Randall R Schulz wrote:
On Friday 09 November 2007 09:43, Robert Smits wrote:
...
...
While this study is great, one should not forget that the google usage environment of hundreds of thousands disks is not directly comparable to what most people do at work or at home.
I.e. most people do not work in air-conditioned data centers and most desktops do not run 24x7.
My systems run 24x7 and are, technically, servers (which is why I run them continuously), though they do not generally experience the kind of continuous use that the Google servers do. As far as environmental controls, it's also true I do not have air conditioning, but most of the time (outside a few summer days, when I sometimes will shut down the systems), the temperature where these machines are located is only a little warmer than the typical air-conditioned machine room.
So while the google paper is certainly informative and a rare beast in regard to the observation of a very large population of commodity harddisks, I would not dare to use any of it's conclusions lightly for my home usage pattern.
Well, it is empirical information. And I don't even know what it means to "use it's conclusions lightly." How would you suggest the results be biased to better reflect office use? Are you just saying that if SMART gives a warning I should "be afraid, be very afraid?"
regards Eberhard
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
While this study is great, one should not forget that the google usage environment of hundreds of thousands disks is not directly comparable to what most people do at work or at home.
I.e. most people do not work in air-conditioned data centers and most desktops do not run 24x7.
My systems run 24x7 and are, technically, servers (which is why I run them continuously), though they do not generally experience the kind of continuous use that the Google servers do.
As far as environmental controls, it's also true I do not have air conditioning, but most of the time (outside a few summer days, when I sometimes will shut down the systems), the temperature where these machines are located is only a little warmer than the typical air-conditioned machine room.
So your usage conditions are very near to googles, maybe except the load on your servers in contrast to theirs. That means that as soon as you have used up a few (ten)thousand disks, your results will most probably be very similar to Google's. ;-))
So while the google paper is certainly informative and a rare beast in regard to the observation of a very large population of commodity harddisks, I would not dare to use any of it's conclusions lightly for my home usage pattern.
Well, it is empirical information.
It is, indeed, and one of only two afaik, so it is really a gem. And I don't even know what it means
to "use it's conclusions lightly." How would you suggest the results be biased to better reflect office use? No way. For this you better use data from the desktop computer departments of Procter@Gamble or GM, if you can get hold of it. ;-)
Are you just saying that if SMART gives a warning I should "be afraid, be very afraid?"
I did not say anything about smart and maybe the google anylysis is ok in showing, that smart is sometimes correct, sometimes not. However this is nothing new. Common sense tells us, that sometimes disks fail, suddenly "out of the blue". Otoh, experience tells us that sometimes they last longer than they should, according to some "smart" tests, at least.
From my personal point of view, I would value the security of my data against the (potentially) needless purchase of a new disk.
Maybe you could also use up your old one as a mirror or RAID disk, thus putting it to good use, as long as it delivers. regards Eberhard -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On November 9, 2007 10:24:20 am Eberhard Roloff wrote:
While this study is great, one should not forget that the google usage environment of hundreds of thousands disks is not directly comparable to what most people do at work or at home.
I.e. most people do not work in air-conditioned data centers and most desktops do not run 24x7.
It is for me. Two of my three desktops at home run 24-7, while the other two do not. And air conditioning, while not universal, is quite normal for many people.
So while the google paper is certainly informative and a rare beast in regard to the observation of a very large population of commodity harddisks, I would not dare to use any of it's conclusions lightly for my home usage pattern.
How so? It's conclusion was basically that you couldn't predict when drive failure was going to occur with SMART. Since that's the case, doesn't it behoove us to have some kind of backup plan, when failure is unpredictable except for very broad parameters? And not to panic when SMART says your drive is failing because it may not be? Bob -- Bob Smits Ph 250-245-2553 Fax 250-245-5531 E-mail bob@rsmits.ca -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Eberhard Roloff wrote:
Randall R Schulz wrote:
On Friday 09 November 2007 09:43, Robert Smits wrote:
On Friday 09 November 2007 01:12:31 G T Smith wrote:
Robert Smits information rather confirms what I have suspected for some time about how one should assess a S.M.A.R.T report, unfortunately Robert did not give a link for the paper he referred to. I would be interested to have a look at it ... Happy to oblige.....
Oddly enough, when I went to download that into my publications directory, I discovered I already had a copy that I downloaded back in February and which is byte-for-byte identical.
While this study is great, one should not forget that the google usage environment of hundreds of thousands disks is not directly comparable to what most people do at work or at home.
I.e. most people do not work in air-conditioned data centers and most desktops do not run 24x7.
So while the google paper is certainly informative and a rare beast in regard to the observation of a very large population of commodity harddisks, I would not dare to use any of it's conclusions lightly for my home usage pattern.
regards Eberhard
Thanks Rob for the link... The paper it is extremely useful but possibly flawed. Eberhard may have missed a couple of points that are probably relevant to home usage. The most important being that there seems to be slight increase in failure rate if the drive has light usage, and the failure pattern of the quaintly labelled 'infant mortality' in which drives are more likely to fail early in use or when the drive is getting on a bit (but the latter is more of a confirmation of what I expect most of us know already). The most difficult problem with the paper is the definition of failure, S.M.A.R.T. mainly reports on the media access status not so much the reliability of electronics controlling that media. As one of the most significant events in recent time was a recall of a large number of a particular manufacturers drives due to poor quality of the latter the failure to distinguish between media failure and electronic failure is problematic. As this is a difficult problem to handle one cannot lay fault at the authors for this, but one does need to take it consideration when considering their results. Four parameters are identified as being critical, but the concentration on annualized failure rate without analysis of mean time to failure weakens the analysis somewhat. There is also an issue in that they report on survival rates after the first event but do not report on secondary events. Survival rates of the drive if there were no subsequent failures reports would have been useful. I would make similar observation on the various sector error counts that they examine. The mean time to failure statistics is also possibly more useful to those dealing with a small quantity of drives. I think the most interesting part is the conclusion that the S.M.A.R.T. indicators are probably nearly useless in predicting the failure or survival of an individual drive on their own, and that there are really only four values one should take notice of. (This does not mean do not use S.M.A.R.T., it means take S.M.A.R.T for what it is, a useful tool for flagging a potential problem). If your are seeing a S.M.A.R.T. error but the file systems on the drive pass all integrity tests there is a fairly good chance this a is false (or non-critical) positive but one should monitor the situation and if the values change adversely take appropriate action. (in other words DONT PANIC!). My conclusion is that this only emphasises the need for a good backup strategy preferably with two independent approaches if one feels that paranoid. Also for a good guarantee of the data integrity of that you wish to backup to invest in at least dual drive Raid 1 to ensure what you back up is not effected by hardware issues. (No guarantee against software SNAFUs of course). Of the S.M.A.R.T reports a scan error is probably the error that is of most concern. Drives with sector allocation related errors, if the values do not change one could probably still use for non critical testing or in configurations such RAID where there is some redundancy. - -- ============================================================================== I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. Bjarne Stroustrup ============================================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHNYSUasN0sSnLmgIRAoDEAKDZZoyrog1irAGP7NB/ZUB/zDp6wgCfeDlL DNhJ2hGbqSNBbZGosXekqU8= =Jgu7 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 11/08/2007 07:15 AM, Randall R Schulz wrote:
On Tuesday 06 November 2007 18:15, Fernando Costa wrote:
Hi all,
Some days ago I began to receive the following message: "Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sda, 1 currently unreadable (pending) sector" ...
I would like to get some authoritative and definitive information about the specific meaning of this warning. I'm a bit skeptical about the dire warnings that these messages mean the drive is near the end of its useful life.
AFAIK, what it means is that your hard drives internal auto correction has already used up all the allocated space to relocate bad sectors. That drive presently has a bad sector but is out of space to relocate it. If it was possible, the bad sector would automatically be remapped to a different sector. This one cannot, thus the warning. It could be no more sectors will go bad, and only the one will be pending to be relocated or remapped. If so, it could last a while longer. It could be some platter defect that will only get worse.
I have a drive that has only been in operation a few months (and it's been a few months of very light use, at that) that is giving me a couple of similar error messages:
Device: /dev/sda, 2 Currently unreadable (pending) sectors Device: /dev/sda, 2 Offline uncorrectable sectors
Above it was 1, is it now 2? Uncorrectable = not able to be remapped. Unreadable = bad. Pending = the drive auto correction would relocate it if it could.
What precisely do these SMART diagnostics mean? How can I get a more detailed diagnosis or possibly remap the defective sectors?
IIUC, your drives remap region is exhausted. It automatically remaps on its own if possible. All you could do is to reformat with bad blocks check to have the file system mark the bad sectors as bad so it does not try to write to it or read from it. Again, though, if the remap region is exhausted already, chances of failure are very high. -- Joe Morris Registered Linux user 231871 running openSUSE 10.3 x86_64 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hey, This is the 3rd story about SMART and drives failing I've heard in 3 days and I'm one of them. The questions I have are, 1) are you running 10.3 2) was the drive a primary master or a secondary AKA system drive vs. storage drive 3) is it possible some process in linux or error in an implementation of a FS that could be changing the formatting of sectors to read as bad under the SMART specs. -- James Tremblay Director of Technology Newmarket School District 213 S. Main st Newmarket NH, 03857 603-659-3271 *318 CNE 3,4,5 MCSE w2k CLE in training Registered Linux user #440182 http://en.opensuse.org/Education Good things to read! http://en.opensuse.org/OpenSUSE_mailing_list_netiquette http://www.catb.org/~esr/faqs/smart-questions.html -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (16)
-
Aaron Kulkis
-
Billie Walsh
-
Eberhard Roloff
-
Ed McCanless
-
Fernando Costa
-
G T Smith
-
James Knott
-
James Tremblay
-
Joe Morris (NTM)
-
Patrick Shanahan
-
Rafael E. Herrera
-
Randall R Schulz
-
Robert Smits
-
Russell Jones
-
Sunny
-
Tom Peters