(In reply to Peter van Hoof from comment #0) > We have a disk server with a Supermicro S3008 L8e SAS controller and 6 SAS > drives of 12 TB each in an mdadm RAID5 software raid configuration. When > starting a scrub of the RAID array with > > echo check > /sys/block/md0/md/sync_action > > after about 0.5 - 1.5 hours of running the scrub, a lot of error messages > start appearing in the syslog. Mostly there are lots of cryptic messages > like this: > > kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), > sub_code(0x011a) > > these are interspersed with other error messages about device resets and I/O > errors: > > kernel: sd 6:0:2:0: Power-on or device reset occurred > > kernel: blk_update_request: I/O error, dev sdc, sector 5160938280 op > 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 > kernel: sd 6:0:2:0: [sdc] tag#1073 CDB: Read(10) 28 00 26 73 b5 65 00 00 01 > 00 > kernel: sd 6:0:2:0: [sdc] tag#1073 FAILED Result: hostbyte=DID_SOFT_ERROR > driverbyte=DRIVER_OK > > These errors happen on all 6 disks in the RAID array (only sdc is shown > here, but the problems on the other disks are essentially identical). > > I have also seen I/O errors in the output of smartctl -a (while the scrub > was ongoing), but that may simply be due to the device being reset during > the call... > > Initially we thought these were hardware problems and we had the server > thoroughly checked by the manufacturer. They swapped out all the hardware, > but the problems would not go away. They concluded that it must be a > software (i.e., driver) issue. I cannot be completely certain, but it looks > like the problems started after upgrading openSUSE 15.1 -> 15.2. The kernel > was fully patched at the time we detected the problems on 29 September. Test > showed that the previous installed kernel version also showed the same > problem. It is likely that all kernel versions shipped with openSUSE 15.2 > show this problem. > > We currently mount the RAID5 array in read-only mode to prevent the I/O > errors from corrupting the file system. This severely limits the > functionality of the server. I used to hear of similar issue situation when the hard drive was device-managed SMR. What are the exact models of these hard drives ? Thanks. Coly Li