RAID Resync on every reboot
One of my servers shows the issue, that RAID resync is started on every reboot since around two weeks. The server showed this behavior occasionally already in the past, but now it is on every reboot. I believe that the RAID is not terminated correctly on reboot. But I can't find any matching messages in the log. Configuration issues or hardware failures (timeouts etc., but no I/O errors or S.M.A.R.T. errors) are also possible. The server was updated with all openSUSE Leaps versions starting 2016. Maybe I should re-create the /etc/mdadm.conf with "mdadm --detail --scan > /etc/mdadm.conf"? Any ideas? Some information about the machine * openSUSE Leap 15.4 * all updates installed * RAID 5 (fake RAID) configured in BIOS of Gigabyte H97-HD3 mainboard Shutdown messages contain one message about mdmon@md127.service service. There is no message about md126. Jun 11 19:38:58 lisa systemd[1]: wickedd.service: Failed with result 'signal'. Jun 11 19:38:58 mybox systemd[1]: Stopped wicked network management service daemon. Jun 11 19:38:58 mybox systemd[1]: mdmon@md127.service: Main process exited, code=killed, status=9/KILL Jun 11 19:38:58 mybox systemd[1]: mdmon@md127.service: Failed with result 'signal'. Jun 11 19:38:58 mybox systemd[1]: dbus.service: Main process exited, code=killed, status=9/KILL Jun 11 19:38:58 mybox systemd[1]: dbus.service: Failed with result 'signal'. Jun 11 19:38:58 mybox systemd[1]: Stopped D-Bus System Message Bus. Jun 11 19:38:58 mybox systemd[1]: auditd.service: Main process exited, code=killed, status=9/KILL Jun 11 19:38:58 mybox systemd[1]: systemd-journald.service: Main process exited, code=killed, status=9/KIL L Jun 11 19:38:58 mybox systemd[1]: systemd-journald.service: Failed with result 'signal'. Jun 11 19:38:59 mybox systemd-cryptsetup[27512]: Device cr_home is not active. Jun 11 19:38:59 mybox systemd-cryptsetup[27512]: Volume cr_home already inactive. Jun 11 19:38:59 mybox systemd-udevd[27517]: Network interface NamePolicy= disabled by default. Jun 11 19:39:01 mybox systemd-journald[27516]: Journal stopped On next boot (reboot) the kernel starts RAID reconstruction for md126: Jun 11 19:41:13 mybox kernel: raid6: avx2x4 gen() 34033 MB/s Jun 11 19:41:13 mybox kernel: raid6: avx2x2 gen() 33665 MB/s Jun 11 19:41:13 mybox kernel: usb 4-1: New USB device found, idVendor=8087, idProduct=8001, bcdDevice= 0.> Jun 11 19:41:13 mybox kernel: usb 4-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 Jun 11 19:41:13 mybox kernel: hub 4-1:1.0: USB hub found Jun 11 19:41:13 mybox kernel: hub 4-1:1.0: 8 ports detected Jun 11 19:41:13 mybox kernel: usb 2-8: new high-speed USB device number 3 using xhci_hcd Jun 11 19:41:13 mybox kernel: raid6: avx2x1 gen() 30982 MB/s Jun 11 19:41:13 mybox kernel: raid6: using algorithm avx2x4 gen() 34033 MB/s Jun 11 19:41:13 mybox kernel: raid6: .... xor() 13168 MB/s, rmw enabled Jun 11 19:41:13 mybox kernel: raid6: using avx2x2 recovery algorithm Jun 11 19:41:13 mybox kernel: usb 1-1: new high-speed USB device number 2 using ehci-pci Jun 11 19:41:13 mybox kernel: async_tx: api initialized (async) Jun 11 19:41:13 mybox kernel: xor: automatically using best checksumming function avx Jun 11 19:41:13 mybox kernel: md/raid:md126: not clean -- starting background reconstruction Jun 11 19:41:13 mybox kernel: md/raid:md126: device sda operational as raid disk 0 Jun 11 19:41:13 mybox kernel: md/raid:md126: device sdb operational as raid disk 1 Jun 11 19:41:13 mybox kernel: md/raid:md126: device sdc operational as raid disk 2 Jun 11 19:41:13 mybox kernel: i915 0000:00:02.0: [drm] Cannot find any crtc or sizes Jun 11 19:41:13 mybox kernel: md/raid:md126: raid level 5 active with 3 out of 3 devices, algorithm 0 Jun 11 19:41:13 mybox kernel: md126: detected capacity change from 0 to 11721056256 Jun 11 19:41:13 mybox kernel: i915 0000:00:02.0: [drm] Cannot find any crtc or sizes Jun 11 19:41:13 mybox systemd[1]: Started MD Metadata Monitor on initrd/md127. Jun 11 19:41:13 mybox kernel: md: resync of RAID array md126 Jun 11 19:41:13 mybox kernel: md126: p1 p2 p3 p4 p5 p6 mybox:~ # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md126 : active raid5 sda[2] sdb[1] sdc[0] 5860528128 blocks super external:/md127/0 level 5, 128k chunk, algorithm 0 [3/3] [UUU] [================>....] resync = 84.6% (2480653388/2930264064) finish=63.6min speed=117661K/sec md127 : inactive sdc[2](S) sdb[1](S) sda[0](S) 7560 blocks super external:imsm unused devices: <none> mybox:~ # cat /etc/mdadm.conf DEVICE containers partitions ARRAY metadata=imsm UUID=aa8b8c23:5079309c:9d892159:6fc14e0b ARRAY /dev/md/Volume1_0 container=aa8b8c23:5079309c:9d892159:6fc14e0b member=0 UUID=2ad07e84:caabad99:89364b2b:c078ad58 mybox:~ # mdadm --detail --scan ARRAY /dev/md/imsm0 metadata=imsm UUID=aa8b8c23:5079309c:9d892159:6fc14e0b ARRAY /dev/md/Volume1 container=/dev/md/imsm0 member=0 UUID=07e8d0c0:507479ce:4328be27:9d7f16a7 Björn
On 12.06.23 09:59, Bjoern Voigt wrote:
One of my servers shows the issue, that RAID resync is started on every reboot since around two weeks. The server showed this behavior occasionally already in the past, but now it is on every reboot.
I believe that the RAID is not terminated correctly on reboot. But I can't find any matching messages in the log. Configuration issues or hardware failures (timeouts etc., but no I/O errors or S.M.A.R.T. errors) are also possible. The server was updated with all openSUSE Leaps versions starting 2016.
Maybe I should re-create the /etc/mdadm.conf with "mdadm --detail --scan > /etc/mdadm.conf"?
The changelog of mdadm contains these latest changes. The bugreports bsc#1205493 and bsc#1205830 are unfortunately not accessible for everyone. Does anyone know, what "various problems with IMSM raid arrays" are fixed? Probably these fixes fail on my server. * Mo Apr 24 2023 Coly Li <colyli@suse.de> - Fixes for mdmon to ensure it run at the right time in the fight mount namespace. This fixes various problems with IMSM raid arrays in 15-SP4 (bsc#1205493, bsc#1205830) - mdmon: fix segfault 0052-mdmon-fix-segfault.patch - util: remove obsolete code from get_md_name 0053-util-remove-obsolete-code-from-get_md_name.patch - mdmon: don't test both 'all' and 'container_name'. 0054-mdmon-don-t-test-both-all-and-container_name.patch - mdmon: change systemd unit file to use --foreground 0055-mdmon-change-systemd-unit-file-to-use-foreground.patch - mdmon: Remove need for KillMode=none 0056-mdmon-Remove-need-for-KillMode-none.patch - mdmon: Improve switchroot interactions. 0057-mdmon-Improve-switchroot-interactions.patch - mdopen: always try create_named_array() 0058-mdopen-always-try-create_named_array.patch - Improvements for IMSM_NO_PLATFORM testing 0059-Improvements-for-IMSM_NO_PLATFORM-testing.patch I will try to reboot with an older version of mdadm soon. Björn
On 13.06.23 16:19, Bjoern Voigt wrote:
The changelog of mdadm contains these latest changes.
The bugreports bsc#1205493 and bsc#1205830 are unfortunately not accessible for everyone. Does anyone know, what "various problems with IMSM raid arrays" are fixed? Probably these fixes fail on my server.
* Mo Apr 24 2023 Coly Li <colyli@suse.de> - Fixes for mdmon to ensure it run at the right time in the fight mount namespace. This fixes various problems with IMSM raid arrays in 15-SP4 (bsc#1205493, bsc#1205830) - mdmon: fix segfault 0052-mdmon-fix-segfault.patch - util: remove obsolete code from get_md_name 0053-util-remove-obsolete-code-from-get_md_name.patch - mdmon: don't test both 'all' and 'container_name'. 0054-mdmon-don-t-test-both-all-and-container_name.patch - mdmon: change systemd unit file to use --foreground 0055-mdmon-change-systemd-unit-file-to-use-foreground.patch - mdmon: Remove need for KillMode=none 0056-mdmon-Remove-need-for-KillMode-none.patch - mdmon: Improve switchroot interactions. 0057-mdmon-Improve-switchroot-interactions.patch - mdopen: always try create_named_array() 0058-mdopen-always-try-create_named_array.patch - Improvements for IMSM_NO_PLATFORM testing 0059-Improvements-for-IMSM_NO_PLATFORM-testing.patch
I will try to reboot with an older version of mdadm soon.
Downgrading the mdadm package to the previous version mdadm-4.1-150300.24.24.2.x86_64. After running mkinitrd and rebooting two times, the issue is solved. Maybe I will write a bug report. Best regards, Björn
On 14.06.23 23:46, Bjoern Voigt wrote:
Downgrading the mdadm package to the previous version mdadm-4.1-150300.24.24.2.x86_64. After running mkinitrd and rebooting two times, the issue is solved.
Maybe I will write a bug report.
See https://bugzilla.opensuse.org/show_bug.cgi?id=1212462 Björn
On Mon, 12 Jun 2023 09:59:08 +0200, Bjoern Voigt <bjoernv@arcor.de> wrote:
One of my servers shows the issue, that RAID resync is started on every reboot since around two weeks. The server showed this behavior occasionally already in the past, but now it is on every reboot.
[...]
Maybe I should re-create the /etc/mdadm.conf with "mdadm --detail --scan > /etc/mdadm.conf"?
Some information about the machine
* openSUSE Leap 15.4 * all updates installed * RAID 5 (fake RAID) configured in BIOS of Gigabyte H97-HD3 mainboard
Does your Bios raid use mdadm? How is that possible? Doesn't a bios raid operate no matter what operating system is booted, including possibly MS Windows? It seems that it must be independent of mdadm. If true, then maybe you have a conflict between your bios raid and mdadm. But, I don't know how bios raids operate, so whatever. -- Robert Webb
On 14.06.23 00:54, Robert Webb wrote:
Does your Bios raid use mdadm? How is that possible? Doesn't a bios raid operate no matter what operating system is booted, including possibly MS Windows? It seems that it must be independent of mdadm. If true, then maybe you have a conflict between your bios raid and mdadm. But, I don't know how bios raids operate, so whatever.
There a different types of RAID in Linux. Basically these are Software RAID, BIOS RAID alias Fake RAID or Host RAID and Hardware RAID. This Q/A page explains the details: https://superuser.com/questions/461506/intel-matrix-storage-manager-vs-linux... Basically Fake RAIDs are set-up in the BIOS. The BIOS only supports some RAID modes (RAID 5 in my case). The BIOS is capable to boot directly from a RAID. But the Linux kernel is used to do the RAID calculation. mdadm is used e.g. to monitor the BIOS RAID. The mdadm changelog shows patches related to IMSM raid arrays. I think "IMSM raid array" is another alias for BIOS RAID. One of the patches causes problems on my server. See the my other post. * Mo Apr 24 2023 Coly Li <colyli@suse.de> - Fixes for mdmon to ensure it run at the right time in the fight mount namespace. This fixes various problems with IMSM raid arrays in 15-SP4 (bsc#1205493, bsc#1205830) - mdmon: fix segfault 0052-mdmon-fix-segfault.patch - util: remove obsolete code from get_md_name 0053-util-remove-obsolete-code-from-get_md_name.patch - mdmon: don't test both 'all' and 'container_name'. 0054-mdmon-don-t-test-both-all-and-container_name.patch - mdmon: change systemd unit file to use --foreground 0055-mdmon-change-systemd-unit-file-to-use-foreground.patch - mdmon: Remove need for KillMode=none 0056-mdmon-Remove-need-for-KillMode-none.patch - mdmon: Improve switchroot interactions. 0057-mdmon-Improve-switchroot-interactions.patch - mdopen: always try create_named_array() 0058-mdopen-always-try-create_named_array.patch - Improvements for IMSM_NO_PLATFORM testing 0059-Improvements-for-IMSM_NO_PLATFORM-testing.patch Björn
participants (2)
-
Bjoern Voigt
-
Robert Webb