On Mon, Sep 28, 2009 at 8:56 AM, Dave Howorth
We had a generator test at work over the weekend that exceeded the UPS time limits so machines shut down. They've all come back but I'm seeing a lot of messages in /var/log/messages of one of them (every few minutes). I strongly suspect they're telling me to buy a new disk but I need to find and read the friendly manual to be sure. Can anybody confirm my suspicion and/or point me at the correct manual?
You may need a new disk. It's not conclusive in my opinion. If it was my drive, I'd likely replace it just to be sure. more interspersed
Thanks, Dave
Here's a sample of one burst of messages:
Sep 28 13:50:07 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:07 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:07 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:07 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error)
Media error is a real error. Bad cables, controller, PSU, wont cause a false media error report. You have at least one sector of the actual platter which when read does not get a checksum agree. Writing to that sector _may_ trigger a relocate and all is well. Or if it is a "soft" error, writing to the sector may just update the data and the checksum/crc thus making all well without a relocate. If the drive was in the middle of writing that specific sector when power failed, it may have only been partially written and thus the checksum / CRC or whatever it is fails. That would be an example of a soft error. So a single bad sector may not be a big deal at all in that case. keep reading...
Sep 28 13:50:07 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:07 suse1 kernel: ata3: EH complete Sep 28 13:50:11 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:11 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:11 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:11 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:11 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:11 suse1 kernel: ata3: EH complete Sep 28 13:50:15 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:15 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:15 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:15 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:20 suse1 avahi-daemon[3595]: Invalid query packet. Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 28 13:50:28 suse1 kernel: ata3.00: BMDMA stat 0x25 Sep 28 13:50:28 suse1 kernel: ata3.00: cmd c8/00:08:a3:c0:73/00:00:00:00:00/e3 tag 0 cdb 0x0 data 4096 in Sep 28 13:50:28 suse1 kernel: res 51/40:08:a3:c0:73/00:00:00:00:00/e3 Emask 0x9 (media error) Sep 28 13:50:28 suse1 kernel: ata3.00: configured for UDMA/100 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] Sep 28 13:50:28 suse1 kernel: Descriptor sense data with sense descriptors (in hex): Sep 28 13:50:28 suse1 kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Sep 28 13:50:28 suse1 kernel: 03 73 c0 a3 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
I'm not familiar with that error. I did not think reallocate was even tried on a read, so I have no idea what the drive is trying to tell you with that.
Sep 28 13:50:28 suse1 kernel: end_request: I/O error, dev sda, sector 57917603
This is the sector you could try to force a relocate on: dd if=/dev/zero of=/dev/sda seek=57917603 bs=512 count=1 I'd grep your logs for "sector" and see if this is the only bad sector. If so, I'd try to fix with dd above. Note that the dd will 100% cause data loss in that sector, so you will want to fsck your drive afterwords. Also if that sector is a data block, you will have a corrupted file. I don't know how to determine which file. If you see lots of bad sectors, it is time to trash the drive.
Sep 28 13:50:28 suse1 kernel: ata3: EH complete Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 28 13:50:28 suse1 avahi-daemon[3595]: Invalid query packet. Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 28 13:50:28 suse1 kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org