Comment # 1 on bug 1177595 from
(In reply to Peter van Hoof from comment #0)
> We have a disk server with a Supermicro S3008 L8e SAS controller and 6 SAS
> drives of 12 TB each in an mdadm RAID5 software raid configuration. When
> starting a scrub of the RAID array with
> 
> echo check > /sys/block/md0/md/sync_action
> 
> after about 0.5 - 1.5 hours of running the scrub, a lot of error messages
> start appearing in the syslog. Mostly there are lots of cryptic messages
> like this:
> 
> kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12),
> sub_code(0x011a)
> 
> these are interspersed with other error messages about device resets and I/O
> errors:
> 
> kernel: sd 6:0:2:0: Power-on or device reset occurred
> 
> kernel: blk_update_request: I/O error, dev sdc, sector 5160938280 op
> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 
> kernel: sd 6:0:2:0: [sdc] tag#1073 CDB: Read(10) 28 00 26 73 b5 65 00 00 01
> 00 
> kernel: sd 6:0:2:0: [sdc] tag#1073 FAILED Result: hostbyte=DID_SOFT_ERROR
> driverbyte=DRIVER_OK 
> 
> These errors happen on all 6 disks in the RAID array (only sdc is shown
> here, but the problems on the other disks are essentially identical).
> 
> I have also seen I/O errors in the output of smartctl -a (while the scrub
> was ongoing), but that may simply be due to the device being reset during
> the call...
> 
> Initially we thought these were hardware problems and we had the server
> thoroughly checked by the manufacturer. They swapped out all the hardware,
> but the problems would not go away. They concluded that it must be a
> software (i.e., driver) issue. I cannot be completely certain, but it looks
> like the problems started after upgrading openSUSE 15.1 -> 15.2. The kernel
> was fully patched at the time we detected the problems on 29 September. Test
> showed that the previous installed kernel version also showed the same
> problem. It is likely that all kernel versions shipped with openSUSE 15.2
> show this problem.
> 
> We currently mount the RAID5 array in read-only mode to prevent the I/O
> errors from corrupting the file system. This severely limits the
> functionality of the server.

I used to hear of similar issue situation when the hard drive was
device-managed SMR.

What are the exact models of these hard drives ?

Thanks.

Coly Li


You are receiving this mail because: