Mailinglist Archive: opensuse-bugs (4689 mails)

< Previous Next >
[Bug 570607] I keep getting SATA errors which lead to the RAID array getting degraded or XFS dying on top.
  • From: bugzilla_noreply@xxxxxxxxxx
  • Date: Tue, 16 Feb 2010 08:22:25 +0000
  • Message-id: <20100216082225.2318ACC7CD@xxxxxxxxxxxxxxxxxxxxxx>
http://bugzilla.novell.com/show_bug.cgi?id=570607

http://bugzilla.novell.com/show_bug.cgi?id=570607#c10


--- Comment #10 from Tejun Heo <teheo@xxxxxxxxxx> 2010-02-16 08:22:20 UTC ---
Aaron, thanks for testing. Can you please attach full log? Also, in general,
please attach (as plain text) full log file after a failure. The large amount
of SCSI and block error messages don't really carry much information regarding
what went wrong.

I went through the log again and it's a bit strange. In log3, the disk aborted
read requests at four different sectors and the smart count reported
current_pending_sectors at 7, which means that it detected 7 unreliable sectors
and they're scheduled for reallocation at the next overwrite (w/o overwrite,
they can't be remapped as the original data can't be read), which seems
correct.

After that, the disk was kicked out of the array and later when the disk was
reinserted to the array, the whole disk was overwritten during resync. This
should have bumped up the reallocation counter. But according to the smartctl
output from comment#4, it seems that the disk didn't actually do the
reallocation during the overwrite. It simply cleared the current_pending
counter. This could mean that after overwrite, the disk firmware thought that
the sector seemed reliable enough and simple overwrite over the failed region
should correct the problem. If so, it's possible that the firmware is
misjudging the nature of those failures causing the same problem to happen
repeatedly by not remapping them.

Can you please record the output of "smartctl -a" right after the failure and
then again after resync is complete? Let's see whether the behavior is
consistent.

Thanks.

--
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

< Previous Next >