[opensuse] smartctl - Help with smartctl output - should I be concerned?
Listmates, I have a power-on error on one drive in an array. The overall health of the drive "PASSED". The output below is from smartctl -a after a short self-test. The power on error was there BEFORE the self-test. Anybody know if this error warrants more concern, or is it a just "watch it" error? Drive firmware issue? [16:40 archangel:/var/log] # smartctl -t short /dev/sdb smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 1 minutes for test to complete. Test will complete after Tue Jan 19 16:42:26 2010 Use smartctl -X to abort test. [16:41 archangel:/var/log] # date Tue Jan 19 16:43:24 CST 2010 [16:43 archangel:/var/log] # smartctl -a /dev/sdb smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.11 Device Model: ST3750330AS Serial Number: 5QK0Q09G Firmware Version: SD1A User Capacity: 750,156,374,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Tue Jan 19 16:44:44 2010 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 642) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 178) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103b) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 109 097 006 Pre-fail Always - 96607574 3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 72 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 10 7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 16317701 9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 6619 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 4 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 72 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 087 087 000 Old_age Always - 13 188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1 190 Airflow_Temperature_Cel 0x0022 067 061 045 Old_age Always - 33 (Lifetime Min/Max 30/33) 194 Temperature_Celsius 0x0022 033 040 000 Old_age Always - 33 (0 22 0 0) 195 Hardware_ECC_Recovered 0x001a 025 015 000 Old_age Always - 96607574 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 207 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 207 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 13 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 13 occurred at disk power-on lifetime: 6619 hours (275 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 9e cf 45 00 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 06 cf 45 40 00 11d+13:54:11.876 READ FPDMA QUEUED 60 00 08 fe c3 45 40 00 11d+13:54:11.874 READ FPDMA QUEUED 60 00 08 f6 c3 45 40 00 11d+13:54:11.839 READ FPDMA QUEUED 60 00 20 d6 62 45 40 00 11d+13:54:11.839 READ FPDMA QUEUED 60 00 98 36 62 45 40 00 11d+13:54:11.838 READ FPDMA QUEUED Error 12 occurred at disk power-on lifetime: 6324 hours (263 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 8f a9 e4 01 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 10 8e a9 e4 41 00 8d+23:03:19.755 READ FPDMA QUEUED 27 00 00 00 00 00 e0 00 8d+23:03:19.755 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 8d+23:03:19.746 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 8d+23:03:19.746 SET FEATURES [Set transfer mode] 27 00 00 00 00 00 e0 00 8d+23:03:19.745 READ NATIVE MAX ADDRESS EXT Error 11 occurred at disk power-on lifetime: 6324 hours (263 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 8f a9 e4 01 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 10 8e a9 e4 41 00 8d+23:03:16.765 READ FPDMA QUEUED 27 00 00 00 00 00 e0 00 8d+23:03:16.765 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 8d+23:03:16.756 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 8d+23:03:16.755 SET FEATURES [Set transfer mode] 27 00 00 00 00 00 e0 00 8d+23:03:16.755 READ NATIVE MAX ADDRESS EXT Error 10 occurred at disk power-on lifetime: 6324 hours (263 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 8f a9 e4 01 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 10 8e a9 e4 41 00 8d+23:03:13.767 READ FPDMA QUEUED 27 00 00 00 00 00 e0 00 8d+23:03:13.766 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 8d+23:03:13.757 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 8d+23:03:13.757 SET FEATURES [Set transfer mode] 27 00 00 00 00 00 e0 00 8d+23:03:13.756 READ NATIVE MAX ADDRESS EXT Error 9 occurred at disk power-on lifetime: 6324 hours (263 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 8f a9 e4 01 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 10 8e a9 e4 41 00 8d+23:03:10.793 READ FPDMA QUEUED 27 00 00 00 00 00 e0 00 8d+23:03:10.793 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 8d+23:03:10.784 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 8d+23:03:10.783 SET FEATURES [Set transfer mode] 27 00 00 00 00 00 e0 00 8d+23:03:10.783 READ NATIVE MAX ADDRESS EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 6619 - # 2 Short offline Completed without error 00% 1479 - # 3 Short offline Completed without error 00% 7 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday, 2010-01-19 at 16:55 -0600, David C. Rankin wrote:
Model Family: Seagate Barracuda 7200.11
Some Seagate models had important firmware errors; I think the link for checking is this one: ] http://seagate.custkb.com/seagate/crm/selfservice/news.jsp?DocId=207931 ...
SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 109 097 006 Pre-fail Always - 96607574 3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 72 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 10
There are 10 reallocated sectors. This is significative, watch it.
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 207 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 207
This might be important.
Error 13 occurred at disk power-on lifetime: 6619 hours (275 days + 19 hours)
This might be what triggered the remapping.
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 6619 - # 2 Short offline Completed without error 00% 1479 - # 3 Short offline Completed without error 00% 7 -
You _must_ run the long test as soon as possible. Do not rely only on the short test. The long test includes a surface scan. That you have remapped sectors is important, but not definitive. You have to watch it, and see if the count increases. When the count increses so much that the disk runs out of spare sectors, then you have to switch the disk fast. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAktWQ+MACgkQtTMYHG2NR9VWWQCdF0P4qBi8Dsnv1nYbw1wVRkQh Nr8An3l1JW89q4ti/K04XuWp+wDdHU82 =aI/w -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Wed, 20 Jan 2010, Carlos E. R. wrote:
On Tuesday, 2010-01-19 at 16:55 -0600, David C. Rankin wrote:
Model Family: Seagate Barracuda 7200.11
Some Seagate models had important firmware errors; I think the link for checking is this one:
] http://seagate.custkb.com/seagate/crm/selfservice/news.jsp?DocId=207931
The ST3750330AS is one of the affected Models, but he already has the "updated" Firmware SD1A (c.f: http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951&Hilite= http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207957&Hilite= http://www.seagate.com/staticfiles/support/downloads/firmware/MooseDT-SD1A-2...
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 72 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 10
There are 10 reallocated sectors. This is significative, watch it.
After 72 Starts/Stops???
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 207 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 207
This might be important.
That _is_ important. "Pending Sectors" are defective, but just not yet "Reallocated Sectors", ditto "Offline Uncorrectable". 217 defects after 72 starts/stops is "DOA" ("Dead On Arrival").
You _must_ run the long test as soon as possible. Do not rely only on the short test. The long test includes a surface scan.
NO!!! (not yet) Replace the disc. NOW! (and get an image with GNU ddrescue[1] or dd_rescue if possible or at least get your data off that disc right effing now). That disc is as good as dead! After the data is saved, you can run the long test, and "wait and see" (and use the disc for temp stuff, such as rpm-caches, ccache, squid-cache and the like, readily expendable stuff anyway). But better wipe the disc with GNU ddrescue or dd_rescue and input /dev/zero, and then get it replaced by Seagate (I'm rather sure it'll qualify for replacement, unless it's an OEM disc on which there's no guarantee). Oh, yes: http://en.wikipedia.org/wiki/S.M.A.R.T.#ATA_S.M.A.R.T._attributes HTH, -dnh [1] GNU ddrescue's not "better" than dd_rescue, just different and I happen to prefer it. See http://www.gnu.org/software/ddrescue/ddrescue.html My openSUSE Packages are here (adjust Distribution as applicable): http://download.opensuse.org/repositories/home:/dnh/openSUSE_11.2/ There's a snag though (see after description). From my .spec: ==== %description GNU ddrescue is a data recovery tool. It copies data from one file or block device (hard disc, cdrom, etc) to another, trying hard to rescue data in case of read errors. Ddrescue does not truncate the output file if not asked to. So, every time you run it on the same output file, it tries to fill in the gaps. The basic operation of ddrescue is fully automatic. ==== Darn. Still haven't asked Kurt Garloff about the naming issue of the package. So, best download the rpm and install with 'rpm -ivh' (and _not_ -Uvh, or yast or zypper, as that'd remove Kurt's dd_rescue), there's no conflict beside the package-name itself, you'll just have two ddrescues rpms with different versions and vendors installed (I too have both installed on my 11.2 box). -- Well I wish you'd just tell me rather than try to engage my enthusiasm, because I haven't got one. -- Marvin -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 01/19/2010 09:25 PM, David Haller wrote:
After 72 Starts/Stops???
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 207 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 207
This might be important. That _is_ important. "Pending Sectors" are defective, but just not yet "Reallocated Sectors", ditto "Offline Uncorrectable".
217 defects after 72 starts/stops is "DOA" ("Dead On Arrival").
You _must_ run the long test as soon as possible. Do not rely only on the short test. The long test includes a surface scan. NO!!! (not yet)
Replace the disc. NOW! (and get an image with GNU ddrescue[1] or dd_rescue if possible or at least get your data off that disc right effing now).
That disc is as good as dead!
After the data is saved, you can run the long test, and "wait and see" (and use the disc for temp stuff, such as rpm-caches, ccache, squid-cache and the like, readily expendable stuff anyway).
But better wipe the disc with GNU ddrescue or dd_rescue and input /dev/zero, and then get it replaced by Seagate (I'm rather sure it'll qualify for replacement, unless it's an OEM disc on which there's no guarantee).
Oh, yes: http://en.wikipedia.org/wiki/S.M.A.R.T.#ATA_S.M.A.R.T._attributes
HTH, -dnh
Dave, Thank you a lot. The disk is part of a mirrored set so basically I just need to split the array and run both standalone until I get the new drive back from seagate. I have a another new 750G drive that I could use in the interim or just leave them running standalone until the replacement gets here. I bet the bad sectors haven't been remapped due to the dmraid setup. I don't know if the disk can remap the sectors when the drive is part of an array -- I don't know what that would do to the sync among the disks?? Off to the seagate self-service site once again... -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday, 2010-01-20 at 01:22 -0600, David C. Rankin wrote:
Thank you a lot. The disk is part of a mirrored set so basically I just need to split the array and run both standalone until I get the new drive back from seagate.
No need. Just keep the disk in place, and replace it when the new one arrives, if you want to replace it. But please notice that having some bad sectors is not a disaster: it is a normal working occurrence. So much it is normal that the manufacturer has put some spare sectors. As I said, it is only important if the number increases, not if it remains stable. That value is a warning, not an alarm. For the moment. Also notice that Seagate will replace the disk free of charge when that spare space is spent, during the warranty period, but might refuse if only 10 sectors are used. Anyway, you will have to run the full external test and obtain the failure code in order to get an RMA. And yes, as the disk in a raid, there is no danger in running the full test. You have another copy of the data, and you will learn if the damage is bigger.
I have a another new 750G drive that I could use in the interim or just leave them running standalone until the replacement gets here. I bet the bad sectors haven't been remapped due to the dmraid setup.
No, not related.
I don't know if the disk can remap the sectors when the drive is part of an array
Of course it can.
-- I don't know what that would do to the sync among the disks??
Nothing. Except... The HD tries to move the data in a bad sector to a spare sector; but the data itself might be bad (read failure). Notice that this is an operation that happens totally inside the HD, without any consideration for the mirrored disk. Nobody outside of the HD knows about the problem. The result could be that both sides had different data after the operation. However, the remapping occurs only during a write operation, so the preceding scenario would not happen. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAktXmYUACgkQtTMYHG2NR9Wx1gCfYooo6TqYdS+SGSNu/9NXQk3h 9c8AnRUG5Ji/sNjsUdwncEfzYlOWzHh3 =+U4W -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. wrote:
Except... The HD tries to move the data in a bad sector to a spare sector; but the data itself might be bad (read failure). Notice that this is an operation that happens totally inside the HD, without any consideration for the mirrored disk. Nobody outside of the HD knows about the problem.
Surely the user (or operating system at least) learns of a read failure? Wouldn't that lead to system/user-level retries and if they failed to rewriting of the sector from the mirror, which in turn would cause the remap? Just trying to understand how all the cogs interlock :) Cheers, Dave
The result could be that both sides had different data after the operation. However, the remapping occurs only during a write operation, so the preceding scenario would not happen.
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2010-01-21 at 12:31 -0000, Dave Howorth wrote:
Carlos E. R. wrote:
Except... The HD tries to move the data in a bad sector to a spare sector; but the data itself might be bad (read failure). Notice that this is an operation that happens totally inside the HD, without any consideration for the mirrored disk. Nobody outside of the HD knows about the problem.
Surely the user (or operating system at least) learns of a read failure? Wouldn't that lead to system/user-level retries and if they failed to rewriting of the sector from the mirror, which in turn would cause the remap?
Just trying to understand how all the cogs interlock :)
No, the operating system doesn't know a thing, because this is completely internal to the HD firmware. I don't know the details, that is, I haven't seen a paper from a manufacturer explaining how exactly they do it. From what I gathered, when the HD attempts to write to a sector and it fails, and determines (somehow?) that that sector is bad and not recoverable, it decides to write the data to another sector, a spare sector defined as such during design by the manufacturer. Somehow, somewhere, external to the filesystem data, it stores that any read/write operation destined to the "bad" sector will happen instead on the remapped sector: meaning that the head has to move there, and operation is a tad slower. All the system notices is that the original write operation went slower. The HD disk reports success... nothing happened. Afterward, if you run smartctl, you see the remap counter has gone one up, that's all. It is possible that there is a protocol for the operating system to learn of this in real time, and perhaps do something. I'm not aware of that, but then, I'm not that expert :-) It is different, though, if the problem occurs during a read. The system will probably get a read failure code, but the HD will do no remapping; I guess because it doesn't know what the correct data to write should be. Again, it is possible that there is a protocol defined (perhaps it is manufacturer dependent) for the operating system to intervene and trigger a remap. I haven't heard of such, but certainly, in case of a raid, it would be very interesting to have. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAktYvjIACgkQtTMYHG2NR9UpogCgk9MNuEKerqUEDsvF64h7tgYO R00An0eQDtuRc8IRqIukXmMFRozkU4O/ =ojCK -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thu, Jan 21, 2010 at 3:50 PM, Carlos E. R.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Thursday, 2010-01-21 at 12:31 -0000, Dave Howorth wrote:
Carlos E. R. wrote:
Except... The HD tries to move the data in a bad sector to a spare sector; but the data itself might be bad (read failure). Notice that this is an operation that happens totally inside the HD, without any consideration for the mirrored disk. Nobody outside of the HD knows about the problem.
Surely the user (or operating system at least) learns of a read failure? Wouldn't that lead to system/user-level retries and if they failed to rewriting of the sector from the mirror, which in turn would cause the remap?
Just trying to understand how all the cogs interlock :)
No, the operating system doesn't know a thing, because this is completely internal to the HD firmware. I don't know the details, that is, I haven't seen a paper from a manufacturer explaining how exactly they do it. From what I gathered, when the HD attempts to write to a sector and it fails, and determines (somehow?) that that sector is bad and not recoverable, it decides to write the data to another sector, a spare sector defined as such during design by the manufacturer. Somehow, somewhere, external to the filesystem data, it stores that any read/write operation destined to the "bad" sector will happen instead on the remapped sector: meaning that the head has to move there, and operation is a tad slower.
All the system notices is that the original write operation went slower. The HD disk reports success... nothing happened. Afterward, if you run smartctl, you see the remap counter has gone one up, that's all.
It is possible that there is a protocol for the operating system to learn of this in real time, and perhaps do something. I'm not aware of that, but then, I'm not that expert :-)
It is different, though, if the problem occurs during a read. The system will probably get a read failure code, but the HD will do no remapping; I guess because it doesn't know what the correct data to write should be.
Again, it is possible that there is a protocol defined (perhaps it is manufacturer dependent) for the operating system to intervene and trigger a remap. I haven't heard of such, but certainly, in case of a raid, it would be very interesting to have.
- -- Cheers, Carlos E. R.
Carlos, I think you have part of this wrong. I've never heard of a slow disk write. Disk drives do NOT verify what they write, they just write. The way it works is that the drive maintains crc info for every physical sector. On read it verifies the crc is correct. If not, then: Consumer drives assume that the only copy of the data is on the drive, so they have built in retry logic that can slow down the read by pretty extended times. (Seconds). Enterprise / raid drives assume you can get the data from other portions of the raid, so they have fast fail logic and don't bother to retry. They just immediately return a crc/media error to the kernel. Either way the drive tags that sector as bad. It does not relocate at this point, so you can keep trying data recovery techniques to your hearts content. ie. The infamous put your drive in the freezer overnight before trying to copy the data off. Eventually when new data is written to the bad sector, the drive internally remaps it to one of the spares, but this is fast. Thus writes are always fast as I understand it. === I have not been following this thread, but I'll assume mdraid is involved. mdraid monitors all sector reads for media errors. When detected it obviously rebuilds that sectors data from other drives. But, it also writes the recreated data back to the bad sector thus triggering the drive to do a remap. So if you have a large raid 1/5/etc. you should run a scan routinely. The scan will find the bad sectors and trigger a remap as described above. You can setup your scan to run routinely via cron. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2010-01-21 at 16:12 -0500, Greg Freemyer wrote:
On Thu, Jan 21, 2010 at 3:50 PM, Carlos E. R. <> wrote:
I think you have part of this wrong. I've never heard of a slow disk write.
Slow as in mS, because the head has to move to a different track, then back for the next sector. A hickup. Not slow as in seconds as you describe when a read fails and retries a dozen times.
Disk drives do NOT verify what they write, they just write.
Yes, I think that is correct. Actually, MsDOS had a setting that forced a verify on each write. I wonder if Linux has it :-? And your explanation below is also correct, I understand. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAktYx5gACgkQtTMYHG2NR9VW9ACgiBlIEnc/Q0FAVIor0Skzfn+e U88AoIs93YmRDAcVdA6XQh3UpPEphhoo =3AHt -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. said the following on 01/21/2010 03:50 PM:
On Thursday, 2010-01-21 at 12:31 -0000, Dave Howorth wrote:
No, the operating system doesn't know a thing, because this is completely internal to the HD firmware.
It is now. It didn't used to be, and I suspect it isn't always.
I don't know the details, that is, I haven't seen a paper from a manufacturer explaining how exactly they do it.
You go on to make a remarkably good guess.
From what I gathered, when the HD attempts to write to a sector and it fails, and determines (somehow?) that that sector is bad and not recoverable,
And that's the important point - is it recoverable? See later.
it decides to write the data to another sector, a spare sector defined as such during design by the manufacturer. Somehow, somewhere, external to the filesystem data, it stores that any read/write operation destined to the "bad" sector will happen instead on the remapped sector: meaning that the head has to move there, and operation is a tad slower.
All the system notices is that the original write operation went slower. The HD disk reports success... nothing happened. Afterward, if you run smartctl, you see the remap counter has gone one up, that's all.
It is different, though, if the problem occurs during a read. The system will probably get a read failure code, but the HD will do no remapping; I guess because it doesn't know what the correct data to write should be.
I don't know if that's the case now, but see below.
Again, it is possible that there is a protocol defined (perhaps it is manufacturer dependent) for the operating system to intervene and trigger a remap. I haven't heard of such, but certainly, in case of a raid, it would be very interesting to have.
That would be interesting .. Anyway: Back at the beginning the 1980s I was working for a UNIX "OEM" shop. Mostly we were porting UNIX to the new microprocessors. If you recall, that was when there were lots of 16-bit micros coming onto the market, most of which aren't around today. There were also a lot of manufacturers trying to do a computer-in-a -box, and wanted an OS, a *real* OS, not a single user thing like MS-DOS. Just as DEC had found a niche below IBM, they were finding niches below DEC/DG. I took an idea that a colleague had sketched out and wrote a disk driver for the PDP-11 under UNIX Version 7, you know, the tapes with "Love, Denis". I also back ported it to Version 6 for a Northern Telecom site. I still have the backup tapes of that project but they are probably unreadable now. The big difference between what you described and what I built was that those old huge drives had a CRC checksum at the end of each sector, and it contained enough information to perform at least one bit of correction. If you trusted the CRC absolutely you could perform a few more bits and do some run-length correction. However all this was carried out during an interrupt, so you didn't want to do too much computation. Once a bad read was detected and corrected the corrected sector was written to one of the spare sectors and the mapping table updated. That early correction and early re-mapping is the basic difference between what you described and what I built. I strongly suspect that modern on-board drive controllers do much the same. Well, OK it was a bit more than that. If every slight read error resulted in a remap those old drives would be remapped to hell and back! No, a first error caused a re-read. On some controllers you could pre-program that so by the time the s/w driver saw the error the hardware & microprogramming had given up. Of course an on-board drive controller has better low-level access to the drive and the raw signals than my host-based driver. However, I recall when using AIX with a large IBM RAID array which was supporting a extensive DB2 database that I once had to update the microcode on each and every disk drive in the array -- while it was running. (Yes, there were 'pauses' in performance.) This makes me think that at least some modern machines are well integrated with the internals of the disk drives, which addresses your final point. -- The scientific name for an animal that doesn't either run from or fight its enemies is lunch. - Michael Friedman -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thu, Jan 21, 2010 at 4:31 PM, Anton Aylward
The big difference between what you described and what I built was that those old huge drives had a CRC checksum at the end of each sector, and it contained enough information to perform at least one bit of correction. If you trusted the CRC absolutely you could perform a few more bits and do some run-length correction.
They still do, but the crc is typically only read/written by the onboard electronics. hdparm supports long_reads and long_writes. These are diagnostic reads and writes that allow you to read the full physical sector including crc. One big issue with today's 1TB drives, etc. is that the error rate per bit is actually increasing, so there are more and more situations where data correction is having to be used. And if there is a small media defect it is more likely to be several bits in size and thus unrecoverable with currently designed crc's. The end result is that the current 512 byte sector is no longer an efficient unit to maintain a header/crc for. Next generation drives will be 4K physical sectors. That way they can more effectively handle small media glitches without having to remap. fyi: drives with 4k physical sectors are already shipping as of a couple months ago. The one I've heard of (WD?) still presents a 512byte logical sector interface to the controller. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg Freemyer said the following on 01/21/2010 05:09 PM:
Next generation drives will be 4K physical sectors. That way they can more effectively handle small media glitches without having to remap.
And about time too! I recall when Berkeley moved to 4K logical sectors for improved performance with their Fast File system back in the 1980s. And MS-DOS/Wondows has been able to use 4K logical sectors sicne .. when?
fyi: drives with 4k physical sectors are already shipping as of a couple months ago. The one I've heard of (WD?) still presents a 512byte logical sector interface to the controller.
Funny: my car has steering wheel and pedals, not a rein and stirrups. -- Sacred cows make the best hamburgers. --Mark Twain -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday, 2010-01-20 at 04:25 +0100, David Haller wrote:
After 72 Starts/Stops???
And 6619 hours of use. Almost a year if it was continuous use. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAktXmnEACgkQtTMYHG2NR9V9zQCeNDXcTfSN5eWAVPU+3yi0KhMa Fs8AnAmUAfZLszqeIndy53fKgQwPidmd =I7y8 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Thu, 21 Jan 2010, Carlos E. R. wrote:
On Wednesday, 2010-01-20 at 04:25 +0100, David Haller wrote:
After 72 Starts/Stops???
And 6619 hours of use. Almost a year if it was continuous use.
More like 3/4ths. It's still a lot of defects and I guess the've all appeared rather lately (or dcr should have noticed earlier). So, it actually depends on when the defects appeared. 6k hours is not "infancy"/DOA, but already in the "stablest" phase of "bathtub curve of disc-deaths". And in that case e.g. googles data suggests immanent and permanent death and/or at least more defects and corrupted data "soonish". I currently have a disc (also a Seagate, but 1.5 TB), that had "suddenly" 5 defective sectors (but rather early in its use, I'd have to boot the server to look it up, a month or three IIRC) which I currently still use for replaceable data, the replacement disc already lies beside me, I just haven't got round to shuffle the data so that I can "migrate" to the new disc. Problem is: in that box I've got 10 discs and only 10 SATA-Ports and drive-bays are also quite full. Reminds me of (time-consuming) juggling ;) At least I already ran an rsync of the /-partition (and /home) to the temp-partition (>15G of stuff). BTW: of the IIRC 4 1.5TB Seagates, that's the only one with defective sectors (so far). But last year, I had 2(!) 500G Samsungs (same model) die on me just a few days apart (one died just like that, completely inaccessible). But I've got discs with >20k hrs (and start_stop_counts >4k) that have no defects (detected by SMART) (I recently switched the older 160G IDE, that had IIRC ~23k hours, for a new 500G in the old box). I think I've used and use >24 discs in the last 5 years (got 4 IDE discs in the old box, 2x500G, 300G, 160G), so I've got a currently running sample of 14 discs of varying ages and sizes, IDE and SATA, with IIRC at least 5 disc replaced (size/death reasons) in the last 2 years. So, I retract my "DOA" (didn't think of the Power On Hours then, and 72 Start/Stop would be <= 36 days here ;), but 270 defects after a mere ~6.6k hours is way too much for my liking. Less than 10 (appearing soonish), and then stable for at least a month, I'd still not trust the disc, but might use it for a while longer (as I do with above mentioned 1.5TB) for expendable data (caches, tempfiles, whatever one can lose or get corrupted at any time and have no worries). -dnh PS: total is currently 1.3 TB (IDE, old box) + IIRC ~12 TB (both formatted and in 2^40Bytes sizes!). And, as usual, see non-random sig, is >95% full, which quite throws Yast (and anything else) off whack, as the "not enough", say, 5% of free space amount to a whopping ~500G or so, and say 10G free on /, which is plenty for updates, or another install alongside ;) I guess I should file a bug that it should look at the space, not the percentage. Though: Yast hasn't complained about discspace running low lately. Maybe it's already fixed :) Oh, and: I've had "full" discs since my first HDD. A then huge 850MB Maxtor in my very first PC. I _so_ agree with Ken ... *sigh* -- The steady state of disks is full. -- Ken Thompson -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2010-01-21 at 04:34 +0100, David Haller wrote:
Hello,
On Thu, 21 Jan 2010, Carlos E. R. wrote:
On Wednesday, 2010-01-20 at 04:25 +0100, David Haller wrote:
After 72 Starts/Stops???
And 6619 hours of use. Almost a year if it was continuous use.
More like 3/4ths. It's still a lot of defects and I guess the've all appeared rather lately (or dcr should have noticed earlier). So, it actually depends on when the defects appeared. 6k hours is not "infancy"/DOA, but already in the "stablest" phase of "bathtub curve of disc-deaths". And in that case e.g. googles data suggests immanent and permanent death and/or at least more defects and corrupted data "soonish".
...
But I've got discs with >20k hrs (and start_stop_counts >4k) that have no defects (detected by SMART) (I recently switched the older 160G IDE, that had IIRC ~23k hours, for a new 500G in the old box).
Me too.
I think I've used and use >24 discs in the last 5 years (got 4 IDE discs in the old box, 2x500G, 300G, 160G), so I've got a currently running sample of 14 discs of varying ages and sizes, IDE and SATA, with IIRC at least 5 disc replaced (size/death reasons) in the last 2 years.
I had one die with about... no, look, I have the data: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 699 5 Reallocated_Sector_Ct 0x0033 001 001 036 Pre-fail Always FAILING_NOW 30748 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 618 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 1277 195 Hardware_ECC_Recovered 0x001a 061 049 000 Old_age Always - 171943716 197 Current_Pending_Sector 0x0012 012 011 000 Old_age Always - 1806 198 Offline_Uncorrectable 0x0010 012 011 000 Old_age Offline - 1806 Very young, you see. Actually, one year old, 320 GB, used for backup data; that's why the hour count was so low. I used it via USB, so it wasn't till I got a new computer with eSATA ports that I could run smartctl on it... and got a nasty surprise. I didn't lose anything, as far as I could see. And I got a replacement from Seagate, free of charge (except for the packaging cost from me to them), refurbished, which I don't know if I can trust or not.
So, I retract my "DOA" (didn't think of the Power On Hours then, and 72 Start/Stop would be <= 36 days here ;), but 270 defects after a mere ~6.6k hours is way too much for my liking. Less than 10 (appearing soonish), and then stable for at least a month, I'd still not trust the disc, but might use it for a while longer (as I do with above mentioned 1.5TB) for expendable data (caches, tempfiles, whatever one can lose or get corrupted at any time and have no worries).
I think the pending count simply means that there hasn't been a write to those bad sectors, so there wasn't a chance to remap them - or in my case, because they were spent. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAktYxRYACgkQtTMYHG2NR9V2JACcDRi5CrhDviwUcW+qLom0sq77 mYcAmgLEvFrI/JNkGiRN6WBksaCsZcOF =AQAn -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 01/19/2010 05:44 PM, Carlos E. R. wrote:
That you have remapped sectors is important, but not definitive. You have to watch it, and see if the count increases. When the count increses so much that the disk runs out of spare sectors, then you have to switch the disk fast.
Hmm, this one is still under warranty. At least seagate will 'pre-ship' a replacement for $20 that comes with the pre-paid return envelope (packing too :-). What that doesn't explain is why in the heck have I had to send back 3 out of 4 of my 750G drives in the past 12 months?? Shame on you Seagate. -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
"On 02:14:46 am "David C. Rankin"
Hmm, this one is still under warranty. At least seagate will 'pre-ship' a replacement for $20 that comes with the pre-paid return envelope (packing too :-).
What that doesn't explain is why in the heck have I had to send back 3 out of 4 of my 750G drives in the past 12 months?? Shame on you Seagate.
--
I had a 500g go bad in less than a year, also did the pre-ship, worth the $20. Mike -- 2.6.27.39-0.2-default GNU/Linux -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
ka1ifq said the following on 01/20/2010 10:23 AM:
"On 02:14:46 am "David C. Rankin"
said" Hmm, this one is still under warranty. At least seagate will 'pre-ship' a replacement for $20 that comes with the pre-paid return envelope (packing too :-).
What that doesn't explain is why in the heck have I had to send back 3 out of 4 of my 750G drives in the past 12 months?? Shame on you Seagate.
--
I had a 500g go bad in less than a year, also did the pre-ship, worth the $20.
Its all about statistics. Somewhere out there is a 500G drive that will work perfectly for another 50 years. But its the ones that fail we hear complaints about. Of course no-one will be using those 500G drives 50 years from now :-) -- There is no use whatever trying to help people who do not help themselves. You cannot push anyone up a ladder unless he be willing to climb himself. - Andrew Carnegie -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 20 January 2010 05:33:09 am Anton Aylward wrote:
Its all about statistics. Somewhere out there is a 500G drive that will work perfectly for another 50 years. But its the ones that fail we hear complaints about.
That would be so *if* they sold enough of them. but it is also all about statistics when 3 out of 4 drives have to be returned. That almost by itself leads into the conclusion that the good old 6 sigma production ethic (it *is* an ethic more than a process) is out the window... -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
kanenas@hawaii.rr.com said the following on 01/20/2010 04:26 PM:
On Wednesday 20 January 2010 05:33:09 am Anton Aylward wrote:
Its all about statistics. Somewhere out there is a 500G drive that will work perfectly for another 50 years. But its the ones that fail we hear complaints about.
That would be so *if* they sold enough of them.
but it is also all about statistics when 3 out of 4 drives have to be returned. That almost by itself leads into the conclusion that the good old 6 sigma production ethic (it *is* an ethic more than a process) is out the window...
No, its about economics. Manufacturing gets to a point where increasing the reliability by another 1% adds another 10%, 15%, 30%, 100%, 600% ... exponentially to the production cost. At some point its cheaper to live with what reliability figures you've got and unconditionally replace any failed units than it is to try an improve the reliability. After all, its the bean counters that run the company, the investors wanting a ROI, the executives who have to face up to the Wall Street analysts scathing comments about the performance of the stock, not the engineers and certainly not the customers. -- If you obey all the rules, you miss all the fun. - Katharine Hepburn -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Quoting ka1ifq
"On 02:14:46 am "David C. Rankin"
said" Hmm, this one is still under warranty. At least seagate will 'pre-ship' a replacement for $20 that comes with the pre-paid return envelope (packing too :-).
What that doesn't explain is why in the heck have I had to send back 3 out of 4 of my 750G drives in the past 12 months?? Shame on you Seagate.
Were the 4 from the same batch (i.e., ordered together)? There are occasional bad batches. Also, statistics: assuming a Gaussian distribution, 10% will fail in less than 1/10 of the MTBF (Mean Time Between Failures). Plus infant mortality, many things will have high rates of failure for the first part of their expected lifetime, then the failure rate will be low and flat for many months/years and then the failure rate will start rising. A bunch of disk failures in the first months is unfortunate, but not terribly surprising. A bunch of 2 year drives failing would be surprising and probably points to environmental concerns, typically heat. Have you checked ambient and drive temperatures? Jeffrey -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 01/20/2010 10:03 AM, Jeffrey L. Taylor wrote:
Quoting ka1ifq
: "On 02:14:46 am "David C. Rankin"
said" Hmm, this one is still under warranty. At least seagate will 'pre-ship' a replacement for $20 that comes with the pre-paid return envelope (packing too :-).
What that doesn't explain is why in the heck have I had to send back 3 out of 4 of my 750G drives in the past 12 months?? Shame on you Seagate.
Were the 4 from the same batch (i.e., ordered together)? There are occasional bad batches.
Also, statistics: assuming a Gaussian distribution, 10% will fail in less than 1/10 of the MTBF (Mean Time Between Failures). Plus infant mortality, many things will have high rates of failure for the first part of their expected lifetime, then the failure rate will be low and flat for many months/years and then the failure rate will start rising. A bunch of disk failures in the first months is unfortunate, but not terribly surprising. A bunch of 2 year drives failing would be surprising and probably points to environmental concerns, typically heat.
Have you checked ambient and drive temperatures?
Jeffrey
That's the strange part, the drives were two 500G and two 750G drives ordered at different time. Both 500's were bad (one replaced twice) and this is the second 750 that has gone bad. The 500's are now perfect. In the 17 years before that (probably 20 hard drives), the ONLY time a hard drive ever failed on me was after the computer had been stored in the attic for a while. Good luck maybe :p As far as temps go, they are always perfect (mid 30s). The case is an Antec 180ss with 4 120mm fans. Example: 10:39 archangel:~> hdtemp /dev/sdb: ST3750330AS: 34°C /dev/sdc: ST3500630AS: 38°C /dev/sdd: ST3750330AS: 35°C /dev/sda: ST3500630AS: 35°C (sdb is the current failing drive) Note: hdtemp is just a simple script that calls hddtemp for all 4 drives. I don't know what the issue is here, but I do know -- it's not heat.... -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (9)
-
Anton Aylward
-
Carlos E. R.
-
Dave Howorth
-
David C. Rankin
-
David Haller
-
Greg Freemyer
-
Jeffrey L. Taylor
-
ka1ifq
-
kanenas@hawaii.rr.com