[opensuse] SMART Prefailure
![](https://seccdn.libravatar.org/avatar/7689edd9ec408ec55061fddbe8607b56.jpg?s=120&d=mm&r=g)
Hello Mates, today i found in my logdigest the following: Messages matching keywords in the "alarming" list: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- (1 lines) Mar 7 12:01:52 linux-eh47 smartd[3228]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 10 Spin_Retry_Count changed from 231 to 232 What says me that? -- Sincereley yours Sascha Manns openSUSE Marketing Team (Weekly News) openSUSE Build Service Web: http://saschamanns.gulli.to Blog: http://lizards.opensuse.org/author/saigkill DISCLAIMER: Please note that in accordance with the German law on data retention, information on every electronic information exchange with me is retained for a period of six months. http://www.vorratsdatenspeicherung.de http://www.ccc.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/77cb4da5f72bc176182dcc33f03a18f3.jpg?s=120&d=mm&r=g)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2009-03-07 at 20:35 +0100, Sascha 'saigkill' Manns wrote:
Hello Mates,
today i found in my logdigest the following:
Messages matching keywords in the "alarming" list: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- (1 lines) Mar 7 12:01:52 linux-eh47 smartd[3228]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 10 Spin_Retry_Count changed from 231 to 232
What says me that?
A priori, only that a value has changed. Make sure that smart is configured to send emails to you on alarms. The file is "/etc/smartd.conf". I'm not sure it is configured by default to do so. Then, you can heed those warnings, and ignore logdigest. Now, to see the full smartd log, run as root: smartctl -a /dev/sdb I think you have to examine the column "WHEN_FAILED". You will see that one of the lines is similar to the one you saw in the logdigest. Some of the attributes are "Pre-fail", which mean they warn about impending failure, and others "Old_age". There was a thread recently about this with links to more info. The command "smartctl -H /dev/sdb" will tell you fast how things are. You could also run the long disk self test: smartctl --test=short /dev/sdb and when it finishes, the long one: smartctl --test=long /dev/sdb The result of the test you can look with "smartctl -a /dev/sdb" (and other combinations, see the manual - it's good, for a change :-) ) - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkmy4a4ACgkQtTMYHG2NR9VtaACfWMptbPZ54MFAB+or96+V3SrQ OdIAn3TCT2S60F+lJX+7DhcVNCvv/UVW =G+s1 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/c558365dc60033abd6105a793869034b.jpg?s=120&d=mm&r=g)
Your harddisk is about to fail. Make a backup and replace it ASAP. Carlos E. R. wrote:
On Saturday, 2009-03-07 at 20:35 +0100, Sascha 'saigkill' Manns wrote:
Hello Mates,
today i found in my logdigest the following:
Messages matching keywords in the "alarming" list: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- (1 lines) Mar 7 12:01:52 linux-eh47 smartd[3228]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 10 Spin_Retry_Count changed from 231 to 232
What says me that?
A priori, only that a value has changed.
Make sure that smart is configured to send emails to you on alarms. The file is "/etc/smartd.conf". I'm not sure it is configured by default to do so. Then, you can heed those warnings, and ignore logdigest.
Now, to see the full smartd log, run as root:
smartctl -a /dev/sdb
I think you have to examine the column "WHEN_FAILED". You will see that one of the lines is similar to the one you saw in the logdigest. Some of the attributes are "Pre-fail", which mean they warn about impending failure, and others "Old_age". There was a thread recently about this with links to more info.
The command "smartctl -H /dev/sdb" will tell you fast how things are. You could also run the long disk self test:
smartctl --test=short /dev/sdb
and when it finishes, the long one:
smartctl --test=long /dev/sdb
The result of the test you can look with "smartctl -a /dev/sdb" (and other combinations, see the manual - it's good, for a change :-) )
-- Cheers, Carlos E. R.
-- Khawar Nehal CEO Applied Technology Research Center (ATRC) C-55 Block A KDA Officers Karachi 75260 Pakistan Mobile : 92-333-2486216 Office : 92-21-8180991 Home : 92-21-4974781 Email : khawar.nehal@atrc.net.pk Website : http://atrc.net.pk Facebook : http://www.facebook.com/home.php#/profile.php?id=557397515 Linked In : http://www.linkedin.com/pub/6/46/a88 Gtalk : khawar.nehal Skype : khawar.nehal -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/04503fbc62cea55d9d824edf9289213f.jpg?s=120&d=mm&r=g)
Well, it may not be that simple. I keep meaning to make this a bug report, but can't get to it. I regularly get a warning message on logging into KDE that my hard disk is about to fail, whith the following info in the logs Mar 16 21:44:26 myrosia-home2 smartd[3641]: Device: /dev/sda, 230905559 Currently unreadable (pending) sectors Mar 16 21:44:26 myrosia-home2 smartd[3641]: Device: /dev/sda, 46300938 Offline uncorrectable sectors I have used this machine for 8 months now getting the same messages (numbers are changing), and no crash as yet. My co-worker has the exact same hardware, but 10.3 instead of 11.0, and he is getting the same messages. His machine has functioned without any trouble for 18 months now. So I am not sure what's going on with SMART for us, but by now I believe that it's not going to crash anytime soon (even though I backup as a matter of good practice) Myrosia On Sat, Mar 14, 2009 at 4:55 PM, Khawar Nehal <khawar.nehal@gmail.com> wrote:
Your harddisk is about to fail. Make a backup and replace it ASAP.
Carlos E. R. wrote:
On Saturday, 2009-03-07 at 20:35 +0100, Sascha 'saigkill' Manns wrote:
Hello Mates,
today i found in my logdigest the following:
Messages matching keywords in the "alarming" list: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- (1 lines) Mar 7 12:01:52 linux-eh47 smartd[3228]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 10 Spin_Retry_Count changed from 231 to 232
What says me that?
A priori, only that a value has changed.
Make sure that smart is configured to send emails to you on alarms. The file is "/etc/smartd.conf". I'm not sure it is configured by default to do so. Then, you can heed those warnings, and ignore logdigest.
Now, to see the full smartd log, run as root:
smartctl -a /dev/sdb
I think you have to examine the column "WHEN_FAILED". You will see that one of the lines is similar to the one you saw in the logdigest. Some of the attributes are "Pre-fail", which mean they warn about impending failure, and others "Old_age". There was a thread recently about this with links to more info.
The command "smartctl -H /dev/sdb" will tell you fast how things are. You could also run the long disk self test:
smartctl --test=short /dev/sdb
and when it finishes, the long one:
smartctl --test=long /dev/sdb
The result of the test you can look with "smartctl -a /dev/sdb" (and other combinations, see the manual - it's good, for a change :-) )
-- Cheers, Carlos E. R.
-- Khawar Nehal CEO Applied Technology Research Center (ATRC) C-55 Block A KDA Officers Karachi 75260 Pakistan Mobile : 92-333-2486216 Office : 92-21-8180991 Home : 92-21-4974781 Email : khawar.nehal@atrc.net.pk Website : http://atrc.net.pk Facebook : http://www.facebook.com/home.php#/profile.php?id=557397515 Linked In : http://www.linkedin.com/pub/6/46/a88 Gtalk : khawar.nehal Skype : khawar.nehal
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/5a94d2f8a5845d5f1c1ea1556fc0cb72.jpg?s=120&d=mm&r=g)
On 03/17/2009 06:47 AM, Myrosia Dzikovska wrote:
Well, it may not be that simple. I keep meaning to make this a bug report, but can't get to it. I regularly get a warning message on logging into KDE that my hard disk is about to fail, whith the following info in the logs
Mar 16 21:44:26 myrosia-home2 smartd[3641]: Device: /dev/sda, 230905559 Currently unreadable (pending) sectors Mar 16 21:44:26 myrosia-home2 smartd[3641]: Device: /dev/sda, 46300938 Offline uncorrectable sectors
I have used this machine for 8 months now getting the same messages (numbers are changing), and no crash as yet. My co-worker has the exact same hardware, but 10.3 instead of 11.0, and he is getting the same messages. His machine has functioned without any trouble for 18 months now. So I am not sure what's going on with SMART for us, but by now I believe that it's not going to crash anytime soon (even though I backup as a matter of good practice)
Myrosia
Remember smartd is only a monitoring program. It does NOT generate the info, just reports it. It is your hard drive that generates it. It is understandable because of the time that you could doubt the errors are serious, but they really are. Most modern HD have a certain amount of space reserved for remapping bad spots to this reserved area. Once that is exhausted, you will start experiencing data loss. An uncorrectable error is one that could not be remapped, which AFAIK means the reserved space is exhausted. Where the data loss happens may not adversely affect you for a while, but it is still data loss. Backups are good for recovery, but if the manufacturer's data from the drive indicates impending failure, I know from experience that cloning for a replacement is so much easier before it is that far gone. I would rather clone a drive that is failing but hasn't got too far gone (with gparted Live cd) than wait til it is a recovery of data/reinstall/whatever situation. -- Joe Morris Registered Linux user 231871 running openSUSE 11.1 x86_64 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/27aacf61a13c66fcc083fcf8a84823bc.jpg?s=120&d=mm&r=g)
Khawar Nehal wrote:
Your harddisk is about to fail. Make a backup and replace it ASAP.
Carlos E. R. wrote:
On Saturday, 2009-03-07 at 20:35 +0100, Sascha 'saigkill' Manns wrote:
Hello Mates, today i found in my logdigest the following: Messages matching keywords in the "alarming" list: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- (1 lines) Mar 7 12:01:52 linux-eh47 smartd[3228]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 10 Spin_Retry_Count changed from 231 to 232 What says me that? A priori, only that a value has changed.
Make sure that smart is configured to send emails to you on alarms. The file is "/etc/smartd.conf". I'm not sure it is configured by default to do so. Then, you can heed those warnings, and ignore logdigest.
Now, to see the full smartd log, run as root:
smartctl -a /dev/sdb
I think you have to examine the column "WHEN_FAILED". You will see that one of the lines is similar to the one you saw in the logdigest. Some of the attributes are "Pre-fail", which mean they warn about impending failure, and others "Old_age". There was a thread recently about this with links to more info.
The command "smartctl -H /dev/sdb" will tell you fast how things are. You could also run the long disk self test:
smartctl --test=short /dev/sdb
and when it finishes, the long one:
smartctl --test=long /dev/sdb
The result of the test you can look with "smartctl -a /dev/sdb" (and other combinations, see the manual - it's good, for a change :-) )
-- Cheers, Carlos E. R.
Well, You can take some of what smartd says with a grain of salt because many of the errors are not errors of impending doom.... I have an old dell box with a 100G drive that smart believes *has been* dying for the past 6 years. For example if you read part of the output it looks bad: SMART Error Log Version: 1 ATA Error Count: 79 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Looking at the 79th error, I see: Error 79 occurred at disk power-on lifetime: 14727 hours (613 days + 15 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 57 4a 00 00 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- a0 00 00 00 00 00 a0 00 05:00:29.671 PACKET a0 00 04 42 ed e0 ea 00 05:00:24.914 PACKET a0 00 00 00 00 00 e0 00 05:00:24.912 PACKET a0 00 00 00 00 00 a0 00 05:00:24.904 PACKET a0 03 08 00 00 00 a0 00 05:00:24.898 PACKET The next logical question, "Is this a failure of recent onset that I should be concerned about?" We know it happed at 14727 hours (613 days + 15 hours) into the drives lifetime, so what is it now? Looking at the current output gives us the answer: 9 Power_On_Hours 0x0032 200 200 000 Old_age Always - 58133 We are now at 58133 hours. So the last smart error occurred 43406 hours ago (or roughly 1809 days or 4 years 348.6 days ago). Regardless, every day in the logs, the smartd error keeps appearing. I think I'm safe with this one. What does the time correlation show on your drive? -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/ee738d4e31faf58ee5b06d27af86a6f9.jpg?s=120&d=mm&r=g)
Hello, On Sat, 07 Mar 2009, Sascha 'saigkill' Manns wrote:
Hello Mates,
today i found in my logdigest the following:
Messages matching keywords in the "alarming" list: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- (1 lines) Mar 7 12:01:52 linux-eh47 smartd[3228]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 10 Spin_Retry_Count changed from 231 to 232
What says me that?
==== http://en.wikipedia.org/wiki/S.M.A.R.T ==== 10 0A Spin Retry Count Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem. ==== Watch closely, keep your backups up to date and prepare to get a replacement. -dnh -- / "Have I mentioned yet that Carlton Gardens is amazingly *green* \ [ when it's raining? Wow. I don't think I've ever seen so much ] \ green, not even at the back of my fridge." -- Matt McLeod / -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (7)
-
Carlos E. R.
-
David C. Rankin
-
David Haller
-
Joe Morris
-
Khawar Nehal
-
Myrosia Dzikovska
-
Sascha 'saigkill' Manns