mdraid scrubbing collides with iscsi
I have a new storage server running LIO as an iscsi target. Disk space is a RAID6 array, 12 x 2Tb drives under mdraid, managed under LVM. Since we moved to this config, a couple of months back, I have had issues roughly every Monday, continuing on to middle of the week. iscsi volumes are unmounted, or go read-only. It turns out to be the Sunday run of mdcheck "MD array scrubbing". It is supposedly limited to 6 hours, which is fine, but my client systems have died long before that. Every day an mdcheck_continue runs at 0105, to resume the scrubbing. Has anyone else experienced this and if so, what did you do? is there maybe a way of running the scrubbing "in the background", letting iscsi services take priority? As it is, it simply doesn't work. -- Per Jessen, Zürich (0.9°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland.
On 11/30/23 02:37, Per Jessen wrote:
I have a new storage server running LIO as an iscsi target. Disk space is a RAID6 array, 12 x 2Tb drives under mdraid, managed under LVM.
Since we moved to this config, a couple of months back, I have had issues roughly every Monday, continuing on to middle of the week. iscsi volumes are unmounted, or go read-only.
It turns out to be the Sunday run of mdcheck "MD array scrubbing". It is supposedly limited to 6 hours, which is fine, but my client systems have died long before that. Every day an mdcheck_continue runs at 0105, to resume the scrubbing.
Has anyone else experienced this and if so, what did you do? is there maybe a way of running the scrubbing "in the background", letting iscsi services take priority? As it is, it simply doesn't work.
Not with iscsi directly, I have 4T of SATA/SAS raid1 arrays and scrubbing takes very close to 2/hr per T. So if you have 12 2T drives (not sure how many arrays you have), but presuming you are using all 2T per array, that's going to take 4 hr/per array. Like you I start at 1:00 am and it takes until 9:00 am to run straight through. I just try and keep a light demand until it completes. There is a good article on the Arch wiki on it https://wiki.archlinux.org/title/RAID#RAID_Maintenance -- David C. Rankin, J.D.,P.E.
On Thu, 30 Nov 2023 22:50:09 -0600, David C. Rankin wrote:
Not with iscsi directly,
I have 4T of SATA/SAS raid1 arrays and scrubbing takes very close to 2/hr per T. So if you have 12 2T drives (not sure how many arrays you have), but presuming you are using all 2T per array, that's going to take 4 hr/per array.
Hi David, it is just one array, RAID6, about 20Tb. It takes forever - maybe also because the system is under load. So the scrubbing is paused, then resumed the next night at 01:05. It took us quite some time to figure out what was happening. Our iscsi clients are timing out all over the place -
There is a good article on the Arch wiki on it https://wiki.archlinux.org/title/RAID#RAID_Maintenance
I think I have seen that, but what I am missing is a way to de-prioritize the scrubbing. -- Per Jessen, Zürich (-0.2°C)
On 12/4/23 08:31, Per Jessen wrote:
I think I have seen that, but what I am missing is a way to de-prioritize the scrubbing.
Good question, Since it isn't a userland process, I don't think renice would do any good. But I do recall a specific way to specify the scrub speed (or whatever the technical term is) that you could lower that would in effect de-prioritize the scrubbing (I always tried to go the other way). What I can't recall is which /proc/xxxx or the like you use to interact with the setting. ... Ahhh, it is /proc/sys/dev/raid/speed_limit_max For a quick ref, See: https://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html -- David C. Rankin, J.D.,P.E.
On 11/30/23 00:37, Per Jessen wrote:
I have a new storage server running LIO as an iscsi target. Disk space is a RAID6 array, 12 x 2Tb drives under mdraid, managed under LVM. Since we moved to this config, a couple of months back, I have had issues roughly every Monday, continuing on to middle of the week. iscsi volumes are unmounted, or go read-only. It turns out to be the Sunday run of mdcheck "MD array scrubbing". It is supposedly limited to 6 hours, which is fine, but my client systems have died long before that. Every day an mdcheck_continue runs at 0105, to resume the scrubbing. Has anyone else experienced this and if so, what did you do? is there maybe a way of running the scrubbing "in the background", letting iscsi services take priority? As it is, it simply doesn't work.
You probably have iSCSI NOPs enabled. They are generally "bad". They are bad because they get lumped in with regular IO, so a "ping" might get bogged down when the IO load is high. The ping might even be queued up and not sent for a period of time. So disabled NOPs in and NOPs out (i.e. in the target and the initiator). Also, check out the iscsid.conf file for tuning you can do. You can increase some of the timeouts, which may help. -- Lee Duncan
participants (3)
-
David C. Rankin
-
Lee Duncan
-
Per Jessen