[opensuse] kernel messages - disk errors?
We had a generator test at work over the weekend that exceeded the UPS time limits so machines shut down. They've all come back but I'm seeing a lot of messages in /var/log/messages of one of them (every few minutes). I strongly suspect they're telling me to buy a new disk but I need to find and read the friendly manual to be sure. Can anybody confirm my suspicion and/or point me at the correct manual? Thanks, Dave Here's a sample of one burst of messages: Sep 28 13:50:07 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:07 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:07 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:07 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:07 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:07 suse1 kernel: ata3: EH complete Sep 28 13:50:11 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:11 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:11 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:11 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:11 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:11 suse1 kernel: ata3: EH complete Sep 28 13:50:15 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:15 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:15 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:15 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:20 suse1 avahi-daemon[3595]: Invalid query packet. Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] Sep 28 13:50:28 suse1 kernel: Descriptor sense data with sense descriptors (in hex): Sep 28 13:50:28 suse1 kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Sep 28 13:50:28 suse1 kernel: 03 73 c0 a3 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed Sep 28 13:50:28 suse1 kernel: end_request: I/O error, dev sda, sector 57917603 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 28 13:50:28 suse1 avahi-daemon[3595]: Invalid query packet. Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Mon, Sep 28, 2009 at 8:56 AM, Dave Howorth <dhoworth@mrc-lmb.cam.ac.uk> wrote:
We had a generator test at work over the weekend that exceeded the UPS time limits so machines shut down. They've all come back but I'm seeing a lot of messages in /var/log/messages of one of them (every few minutes). I strongly suspect they're telling me to buy a new disk but I need to find and read the friendly manual to be sure. Can anybody confirm my suspicion and/or point me at the correct manual?
You may need a new disk. It's not conclusive in my opinion. If it was my drive, I'd likely replace it just to be sure. more interspersed
Thanks, Dave
Here's a sample of one burst of messages:
Sep 28 13:50:07 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:07 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:07 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:07 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error)
Media error is a real error. Bad cables, controller, PSU, wont cause a false media error report. You have at least one sector of the actual platter which when read does not get a checksum agree. Writing to that sector _may_ trigger a relocate and all is well. Or if it is a "soft" error, writing to the sector may just update the data and the checksum/crc thus making all well without a relocate. If the drive was in the middle of writing that specific sector when power failed, it may have only been partially written and thus the checksum / CRC or whatever it is fails. That would be an example of a soft error. So a single bad sector may not be a big deal at all in that case. keep reading...
Sep 28 13:50:07 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:07 suse1 kernel: ata3: EH complete Sep 28 13:50:11 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:11 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:11 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:11 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:11 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:11 suse1 kernel: ata3: EH complete Sep 28 13:50:15 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:15 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:15 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:15 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:20 suse1 avahi-daemon[3595]: Invalid query packet. Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] Sep 28 13:50:28 suse1 kernel: Descriptor sense data with sense descriptors (in hex): Sep 28 13:50:28 suse1 kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Sep 28 13:50:28 suse1 kernel: 03 73 c0 a3 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
I'm not familiar with that error. I did not think reallocate was even tried on a read, so I have no idea what the drive is trying to tell you with that.
Sep 28 13:50:28 suse1 kernel: end_request: I/O error, dev sda, sector 57917603
This is the sector you could try to force a relocate on: dd if=/dev/zero of=/dev/sda seek=57917603 bs=512 count=1 I'd grep your logs for "sector" and see if this is the only bad sector. If so, I'd try to fix with dd above. Note that the dd will 100% cause data loss in that sector, so you will want to fsck your drive afterwords. Also if that sector is a data block, you will have a corrupted file. I don't know how to determine which file. If you see lots of bad sectors, it is time to trash the drive.
Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 28 13:50:28 suse1 avahi-daemon[3595]: Invalid query packet. Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg Freemyer wrote:
You may need a new disk. It's not conclusive in my opinion. If it was my drive, I'd likely replace it just to be sure.
Thanks for your help, Greg
Media error is a real error. Bad cables, controller, PSU, wont cause a false media error report. You have at least one sector of the actual platter which when read does not get a checksum agree. Writing to that sector _may_ trigger a relocate and all is well. Or if it is a "soft" error, writing to the sector may just update the data and the checksum/crc thus making all well without a relocate.
If the drive was in the middle of writing that specific sector when power failed, it may have only been partially written and thus the checksum / CRC or whatever it is fails. That would be an example of a soft error.
So a single bad sector may not be a big deal at all in that case.
Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
I'm not familiar with that error. I did not think reallocate was even tried on a read, so I have no idea what the drive is trying to tell you with that.
Sep 28 13:50:28 suse1 kernel: end_request: I/O error, dev sda, sector 57917603
This is the sector you could try to force a relocate on:
dd if=/dev/zero of=/dev/sda seek=57917603 bs=512 count=1
I tried that but it appears to have failed. It took a long time and top showed a high percentage io wait time while it was active: # dd if=/dev/zero of=/dev/sda seek=57917603 bs=512 count=1 dd: writing `/dev/sda': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 25.471 s, 0.0 kB/s and the log is still filling with the same error messages
I'd grep your logs for "sector" and see if this is the only bad sector. If so, I'd try to fix with dd above. Note that the dd will 100% cause data loss in that sector, so you will want to fsck your drive afterwords. Also if that sector is a data block, you will have a corrupted file. I don't know how to determine which file.
If you see lots of bad sectors, it is time to trash the drive.
grep showed me just that one sector number. So I guess I still don't know whether the drive is bad. Cheers, Dave -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Mon, Sep 28, 2009 at 9:49 AM, Dave Howorth <dhoworth@mrc-lmb.cam.ac.uk> wrote:
Greg Freemyer wrote:
You may need a new disk. It's not conclusive in my opinion. If it was my drive, I'd likely replace it just to be sure.
Thanks for your help, Greg
Media error is a real error. Bad cables, controller, PSU, wont cause a false media error report. You have at least one sector of the actual platter which when read does not get a checksum agree. Writing to that sector _may_ trigger a relocate and all is well. Or if it is a "soft" error, writing to the sector may just update the data and the checksum/crc thus making all well without a relocate.
If the drive was in the middle of writing that specific sector when power failed, it may have only been partially written and thus the checksum / CRC or whatever it is fails. That would be an example of a soft error.
So a single bad sector may not be a big deal at all in that case.
Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
I'm not familiar with that error. I did not think reallocate was even tried on a read, so I have no idea what the drive is trying to tell you with that.
Sep 28 13:50:28 suse1 kernel: end_request: I/O error, dev sda, sector 57917603
This is the sector you could try to force a relocate on:
dd if=/dev/zero of=/dev/sda seek=57917603 bs=512 count=1
I tried that but it appears to have failed. It took a long time and top showed a high percentage io wait time while it was active:
# dd if=/dev/zero of=/dev/sda seek=57917603 bs=512 count=1 dd: writing `/dev/sda': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 25.471 s, 0.0 kB/s
and the log is still filling with the same error messages
What do the new logs say about reallocate? If reallocate on write failed, toss the drive. Did you try the dd command twice. Maybe it took 25 seconds to do the first realloc, and now all is well. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg Freemyer wrote:
What do the new logs say about reallocate? If reallocate on write failed, toss the drive.
Did you try the dd command twice. Maybe it took 25 seconds to do the first realloc, and now all is well.
Greg
Well, I just tried it again with very similar results, although this time it took 27 seconds to do nothing ;) # dd if=/dev/zero of=/dev/sda seek=57917603 bs=512 count=1 dd: writing `/dev/sda': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 27.48 s, 0.0 kB/s The log snippet below was produced as that command terminated (there was nothing produced while it ran). I don't see anything about a write error, just a read error? Dave Sep 28 15:34:12 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 15:34:12 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 15:34:12 suse1 kernel: ata3.00: cmd c8/00:08:a0:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 15:34:12 suse1 kernel: res 51/40:05:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 15:34:12 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 15:34:12 suse1 kernel: ata3: EH complete Sep 28 15:34:16 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 15:34:16 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 15:34:16 suse1 kernel: ata3.00: cmd c8/00:08:a0:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 15:34:16 suse1 kernel: res 51/40:05:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 15:34:16 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 15:34:16 suse1 kernel: ata3: EH complete Sep 28 15:34:21 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 15:34:21 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 15:34:21 suse1 kernel: ata3.00: cmd c8/00:08:a0:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 15:34:21 suse1 kernel: res 51/40:05:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 15:34:35 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 15:34:35 suse1 kernel: ata3: EH complete Sep 28 15:34:35 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 15:34:35 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 15:34:35 suse1 kernel: ata3.00: cmd c8/00:08:a0:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 15:34:35 suse1 kernel: res 51/40:05:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 15:34:35 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 15:34:35 suse1 kernel: ata3: EH complete Sep 28 15:34:35 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 15:34:35 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 15:34:35 suse1 kernel: ata3.00: cmd c8/00:08:a0:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 15:34:35 suse1 kernel: res 51/40:05:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 15:34:35 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 15:34:35 suse1 kernel: ata3: EH complete Sep 28 15:34:35 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 15:34:35 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 15:34:35 suse1 kernel: ata3.00: cmd c8/00:08:a0:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 15:34:35 suse1 kernel: res 51/40:05:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 15:34:35 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] Sep 28 15:34:35 suse1 kernel: Descriptor sense data with sense descriptors (in hex): Sep 28 15:34:35 suse1 kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Sep 28 15:34:35 suse1 kernel: 03 73 c0 a3 Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed Sep 28 15:34:35 suse1 kernel: end_request: I/O error, dev sda, sector 57917603 Sep 28 15:34:35 suse1 kernel: ata3: EH complete Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 28 15:34:35 suse1 kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Dave, I've run out of advice. Good Luck. Personally I would replace the drive. Taking 25 seconds to write one sector repeatedly is just plain broken. Most filesystems no longer have a "badblock" capability because they depend on the drive to handle that automatically and your drive is not for whatever reason. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2009-09-28 at 15:41 +0100, Dave Howorth wrote:
Did you try the dd command twice. Maybe it took 25 seconds to do the first realloc, and now all is well.
Well, I just tried it again with very similar results, although this time it took 27 seconds to do nothing ;)
Run the smart tests (smartctl). - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkrBFrAACgkQtTMYHG2NR9Xy5QCfYMCjzyUS9OxovFYduuA/McZy g/gAnA549dno7JUc1l1/vQXSyMkf4WDf =xceY -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. wrote:
On Monday, 2009-09-28 at 15:41 +0100, Dave Howorth wrote:
Did you try the dd command twice. Maybe it took 25 seconds to do the first realloc, and now all is well.
Well, I just tried it again with very similar results, although this time it took 27 seconds to do nothing ;)
Run the smart tests (smartctl).
This morning I got a friendly mail message from smart: The following warning/error was logged by the smartd daemon: Device: /dev/sda, FAILED SMART self-check. BACK UP DATA NOW! I had tried to run smart yesterday but got no output. Today I tried again and left it running for some minutes with the following result: # smartctl -H /dev/sda smartctl version 5.37 [x86_64-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. Failed Attributes: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 058 058 062 Pre-fail Always FAILING_NOW 246553639 So I've got a new drive (bigger, faster :) and am now figuring out the best way to replace the failing one. Many thanks to Greg and you for your help, Dave -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello! I stumbled across this older email today as I was Googling for the same information. I just updated a small server to the latest kernel a few days ago (from since about summer). Before that I had not seen any errors like this, but after that I did. Now, SUSE doesn't have any way of rolling back the update, so I can't check anything else than the /var/log/messages. Can it be that the kernel or drivers were changed and started to produce these errors? I mean more that they only now started to show these errors rather than they are the source for the errors. I just don't understand why they started just now. I'm a bit worried as I had done the classical error of using the same kind of disks for use and backup and yes, they are connected to the same motherboard even. Thanks! -- HG On Mon, Sep 28, 2009 at 2:56 PM, Dave Howorth <dhoworth@mrc-lmb.cam.ac.uk> wrote:
We had a generator test at work over the weekend that exceeded the UPS time limits so machines shut down. They've all come back but I'm seeing a lot of messages in /var/log/messages of one of them (every few minutes). I strongly suspect they're telling me to buy a new disk but I need to find and read the friendly manual to be sure. Can anybody confirm my suspicion and/or point me at the correct manual?
Thanks, Dave
Here's a sample of one burst of messages:
Sep 28 13:50:07 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:07 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:07 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:07 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:07 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:07 suse1 kernel: ata3: EH complete Sep 28 13:50:11 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:11 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:11 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:11 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:11 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:11 suse1 kernel: ata3: EH complete Sep 28 13:50:15 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:15 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:15 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:15 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:20 suse1 avahi-daemon[3595]: Invalid query packet. Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] Sep 28 13:50:28 suse1 kernel: Descriptor sense data with sense descriptors (in hex): Sep 28 13:50:28 suse1 kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Sep 28 13:50:28 suse1 kernel: 03 73 c0 a3 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed Sep 28 13:50:28 suse1 kernel: end_request: I/O error, dev sda, sector 57917603 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 28 13:50:28 suse1 avahi-daemon[3595]: Invalid query packet. Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (4)
-
Carlos E. R.
-
Dave Howorth
-
Greg Freemyer
-
HG