![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
https://bugzilla.suse.com/show_bug.cgi?id=1177595 https://bugzilla.suse.com/show_bug.cgi?id=1177595#c1 --- Comment #1 from Coly Li <colyli@suse.com> --- (In reply to Peter van Hoof from comment #0)
We have a disk server with a Supermicro S3008 L8e SAS controller and 6 SAS drives of 12 TB each in an mdadm RAID5 software raid configuration. When starting a scrub of the RAID array with
echo check > /sys/block/md0/md/sync_action
after about 0.5 - 1.5 hours of running the scrub, a lot of error messages start appearing in the syslog. Mostly there are lots of cryptic messages like this:
kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
these are interspersed with other error messages about device resets and I/O errors:
kernel: sd 6:0:2:0: Power-on or device reset occurred
kernel: blk_update_request: I/O error, dev sdc, sector 5160938280 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 kernel: sd 6:0:2:0: [sdc] tag#1073 CDB: Read(10) 28 00 26 73 b5 65 00 00 01 00 kernel: sd 6:0:2:0: [sdc] tag#1073 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
These errors happen on all 6 disks in the RAID array (only sdc is shown here, but the problems on the other disks are essentially identical).
I have also seen I/O errors in the output of smartctl -a (while the scrub was ongoing), but that may simply be due to the device being reset during the call...
Initially we thought these were hardware problems and we had the server thoroughly checked by the manufacturer. They swapped out all the hardware, but the problems would not go away. They concluded that it must be a software (i.e., driver) issue. I cannot be completely certain, but it looks like the problems started after upgrading openSUSE 15.1 -> 15.2. The kernel was fully patched at the time we detected the problems on 29 September. Test showed that the previous installed kernel version also showed the same problem. It is likely that all kernel versions shipped with openSUSE 15.2 show this problem.
We currently mount the RAID5 array in read-only mode to prevent the I/O errors from corrupting the file system. This severely limits the functionality of the server.
I used to hear of similar issue situation when the hard drive was device-managed SMR. What are the exact models of these hard drives ? Thanks. Coly Li -- You are receiving this mail because: You are the assignee for the bug.