![](https://seccdn.libravatar.org/avatar/27aacf61a13c66fcc083fcf8a84823bc.jpg?s=120&d=mm&r=g)
Khawar Nehal wrote:
Your harddisk is about to fail. Make a backup and replace it ASAP.
Carlos E. R. wrote:
On Saturday, 2009-03-07 at 20:35 +0100, Sascha 'saigkill' Manns wrote:
Hello Mates, today i found in my logdigest the following: Messages matching keywords in the "alarming" list: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- (1 lines) Mar 7 12:01:52 linux-eh47 smartd[3228]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 10 Spin_Retry_Count changed from 231 to 232 What says me that? A priori, only that a value has changed.
Make sure that smart is configured to send emails to you on alarms. The file is "/etc/smartd.conf". I'm not sure it is configured by default to do so. Then, you can heed those warnings, and ignore logdigest.
Now, to see the full smartd log, run as root:
smartctl -a /dev/sdb
I think you have to examine the column "WHEN_FAILED". You will see that one of the lines is similar to the one you saw in the logdigest. Some of the attributes are "Pre-fail", which mean they warn about impending failure, and others "Old_age". There was a thread recently about this with links to more info.
The command "smartctl -H /dev/sdb" will tell you fast how things are. You could also run the long disk self test:
smartctl --test=short /dev/sdb
and when it finishes, the long one:
smartctl --test=long /dev/sdb
The result of the test you can look with "smartctl -a /dev/sdb" (and other combinations, see the manual - it's good, for a change :-) )
-- Cheers, Carlos E. R.
Well, You can take some of what smartd says with a grain of salt because many of the errors are not errors of impending doom.... I have an old dell box with a 100G drive that smart believes *has been* dying for the past 6 years. For example if you read part of the output it looks bad: SMART Error Log Version: 1 ATA Error Count: 79 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Looking at the 79th error, I see: Error 79 occurred at disk power-on lifetime: 14727 hours (613 days + 15 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 57 4a 00 00 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- a0 00 00 00 00 00 a0 00 05:00:29.671 PACKET a0 00 04 42 ed e0 ea 00 05:00:24.914 PACKET a0 00 00 00 00 00 e0 00 05:00:24.912 PACKET a0 00 00 00 00 00 a0 00 05:00:24.904 PACKET a0 03 08 00 00 00 a0 00 05:00:24.898 PACKET The next logical question, "Is this a failure of recent onset that I should be concerned about?" We know it happed at 14727 hours (613 days + 15 hours) into the drives lifetime, so what is it now? Looking at the current output gives us the answer: 9 Power_On_Hours 0x0032 200 200 000 Old_age Always - 58133 We are now at 58133 hours. So the last smart error occurred 43406 hours ago (or roughly 1809 days or 4 years 348.6 days ago). Regardless, every day in the logs, the smartd error keeps appearing. I think I'm safe with this one. What does the time correlation show on your drive? -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org