Hi. I use a retired office pc with core i5 with as a NAS. I have dropped in 2 disks (3.5T sata) and configured as md raid1 and with lvm on top. It has worked flawless for a year or so, but now after the latest 2 reboots it has triggered a raid rebuilding. Only thing I find in dmesg (but I am not sure what to search for) klaus@raagi:~> sudo dmesg | grep " md" [ 4.131071] md/raid1:md127: active with 2 out of 2 mirrors [ 4.282183] md127: detected capacity change from 0 to 4000261472256 [ 64.407382] md: data-check of RAID array md127 [21681.409218] md: md127: data-check interrupted. [25477.760407] md: data-check of RAID array md127 [47096.271827] md: md127: data-check interrupted. I left the PC running after reboot, so I am not sure why the data-check was interrupted. Any ideas on why it has started to rebuild after every reboot? -- Cheers Klaus
On Sun, 14 Nov 2021 10:56:09 +0100 Klaus Vink Slott <list-s@vink-slott.dk> wrote:
Hi.
I use a retired office pc with core i5 with as a NAS. I have dropped in 2 disks (3.5T sata) and configured as md raid1 and with lvm on top. It has worked flawless for a year or so, but now after the latest 2 reboots it has triggered a raid rebuilding.
Only thing I find in dmesg (but I am not sure what to search for)
klaus@raagi:~> sudo dmesg | grep " md" [ 4.131071] md/raid1:md127: active with 2 out of 2 mirrors [ 4.282183] md127: detected capacity change from 0 to 4000261472256 [ 64.407382] md: data-check of RAID array md127 [21681.409218] md: md127: data-check interrupted. [25477.760407] md: data-check of RAID array md127 [47096.271827] md: md127: data-check interrupted.
I left the PC running after reboot, so I am not sure why the data-check was interrupted.
Any ideas on why it has started to rebuild after every reboot?
One obvious question. Have you checked that the disks are showing good status with no errors? Yes, I'd expect to get some warnings anyway but a specific test might be wise.
søndag den 14. november 2021 13.01.03 CET skrev Dave Howorth:
On Sun, 14 Nov 2021 10:56:09 +0100
Klaus Vink Slott <list-s@vink-slott.dk> wrote:
Hi.
I use a retired office pc with core i5 with as a NAS. I have dropped in 2 disks (3.5T sata) and configured as md raid1 and with lvm on top. It has worked flawless for a year or so, but now after the latest 2 reboots it has triggered a raid rebuilding. Re-reading my own mail I noticed that md says "data-check" and not "rebuilding" I was not aware there was such a thing. Searching the interwebs I found that there is a difference, it is not rebuilding - although /proc/mdstat looks more or less the same as during a rebuild.
klaus@raagi:~> sudo dmesg | grep " md" ... [ 64.407382] md: data-check of RAID array md127 [21681.409218] md: md127: data-check interrupted. ... Any ideas on why it has started to rebuild after every reboot?
One obvious question. Have you checked that the disks are showing good status with no errors? Yes, I'd expect to get some warnings anyway but a specific test might be wise. Yes SMART declares both drive healthy. SMART overall-health self-assessment test result: PASSED
So now I just need to find out why the integrity check is interrupted. -- Thanks Klaus
On 15/11/2021 08.12, Klaus Vink Slott wrote:
søndag den 14. november 2021 13.01.03 CET skrev Dave Howorth:
On Sun, 14 Nov 2021 10:56:09 +0100 Klaus Vink Slott <list-s@vink-slott.dk> wrote:
One obvious question. Have you checked that the disks are showing good status with no errors? Yes, I'd expect to get some warnings anyway but a specific test might be wise. Yes SMART declares both drive healthy. SMART overall-health self-assessment test result: PASSED
So now I just need to find out why the integrity check is interrupted.
You should tell smartctl to do a long test on the hard disks. This includes a surface test. A disk can be passed even if it has media defects, but one should affect the mirror. -- Cheers / Saludos, Carlos E. R. (from oS Leap 15.2 x86_64 (Minas Tirith))
Re-reading my own mail I noticed that md says "data-check" and not "rebuilding" I was not aware there was such a thing. Searching the interwebs I found that there is a difference, it is not rebuilding - although /proc/mdstat looks more or less the same as during a rebuild. Exactly. This is normal behaviour by the RAID system for eons now. The drives are checked for consistency once a month. If there is an error, a drive is marked faulty and is bumped out of the active set. It would remain faulty without manual intervention by the sysop. So, you cannot see the array "rebuilding" unless you (or someone with access) triggers such an action -- or drive is bumped and a hot spare is part of the
On 11/15/21 8:12 AM, Klaus Vink Slott wrote: process. And most home users don't have hot spares. Cheers, - Adam
On 11/14/21 3:56 AM, Klaus Vink Slott wrote:
Hi.
I use a retired office pc with core i5 with as a NAS. I have dropped in 2 disks (3.5T sata) and configured as md raid1 and with lvm on top. It has worked flawless for a year or so, but now after the latest 2 reboots it has triggered a raid rebuilding.
Only thing I find in dmesg (but I am not sure what to search for)
klaus@raagi:~> sudo dmesg | grep " md" [ 4.131071] md/raid1:md127: active with 2 out of 2 mirrors [ 4.282183] md127: detected capacity change from 0 to 4000261472256 [ 64.407382] md: data-check of RAID array md127 [21681.409218] md: md127: data-check interrupted. [25477.760407] md: data-check of RAID array md127 [47096.271827] md: md127: data-check interrupted.
I left the PC running after reboot, so I am not sure why the data-check was interrupted.
Any ideas on why it has started to rebuild after every reboot?
Also, add details about what OS, Release or at least the kernel, mdadm and LVM versions. (LVM shouldn't matter) You can figure right at about 2 hours per-TB to rebuild (or sync) a RAID1 array. First thing I would check is 'smartctl -a /dev/drive' and get the health of each individual disk. You can do that from a boot disk, or when the system is running. Where drive would be, e.g. sda or sdc, etc.. (whatever the disk is). You can find out which physical disks it thinks are part of the array with 'mdadm -D /dev/md127' (as root). It will show, e.g. # mdadm -D /dev/md4 /dev/md4: Version : 1.2 Creation Time : Mon Mar 21 02:27:21 2016 Raid Level : raid1 Array Size : 2930135488 (2794.39 GiB 3000.46 GB) Used Dev Size : 2930135488 (2794.39 GiB 3000.46 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sat Nov 13 19:53:10 2021 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Name : valkyrie:4 (local to host valkyrie) UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875 Events : 12697 Number Major Minor RaidDevice State 3 8 32 0 active sync /dev/sdc 2 8 48 1 active sync /dev/sdd (look at last two lines here ------------------------ ^^^^^^^^ ) I hate to venture a guess, but it looks like one of your disks either isn't initialized when the array is started and then appears later prompting an add of that disk to the array and a rebuild (check dmesg for each drive initialization), or the disk is just losing its mind each time it is shutdown. This may need to be asked on the mdadm list, e.g. linux-raid@vger.kernel.org BACKUP your DATA from the GOOD disk NOW! (fix the problem after that) -- David C. Rankin, J.D.,P.E.
Hi David and thanks for your detailed guidance. As I just wrote in my previous letter to the list I think I misinterpreted what was happening. The raid is not rebuilding but trying to run a integrity check. I have never noticed this before, but on the other hand I have never had disks this size in a old low performance pc. On 15. november 2021 05.04.40 CET David C. Rankin wrote: > On 11/14/21 3:56 AM, Klaus Vink Slott wrote: > > Hi. > > > > I use a retired office pc with core i5 with as a NAS. I have dropped in 2 > > disks (3.5T sata) and configured as md raid1 and with lvm on top. It has > > worked flawless for a year or so, but now after the latest 2 reboots it > > has triggered a raid rebuilding. > > > > Only thing I find in dmesg (but I am not sure what to search for) > > > > klaus@raagi:~> sudo dmesg | grep " md" > > [ 4.131071] md/raid1:md127: active with 2 out of 2 mirrors > > [ 4.282183] md127: detected capacity change from 0 to 4000261472256 > > [ 64.407382] md: data-check of RAID array md127 ^^^^^^^^^ data-check, not rebuilding > > [21681.409218] md: md127: data-check interrupted. > > [25477.760407] md: data-check of RAID array md127 > > [47096.271827] md: md127: data-check interrupted. > > > > I left the PC running after reboot, so I am not sure why the data-check > > was interrupted. > > > > Any ideas on why it has started to rebuild after every reboot? > > Also, add details about what OS, Release or at least the kernel, mdadm and > LVM versions. (LVM shouldn't matter) OpenSUSE 15.3 LEAP Linux raagi 5.3.18-59.27-default #1 SMP Tue Oct 5 10:00:40 UTC 2021 (7df2404) x86_64 x86_64 x86_64 GNU/Linux > First thing I would check is 'smartctl -a /dev/drive' SMART overall-health self-assessment test result: PASSED - on both drives > 'mdadm -D /dev/md127' (as root). ... Number Major Minor RaidDevice State 0 8 18 0 active sync /dev/sdb2 1 8 2 1 active sync /dev/sda2 > (look at last two lines here Seem ok > BACKUP your DATA from the GOOD disk NOW! (fix the problem after that) No worries. Data is mirrored to another location, and borgbackup keeps data safe on a third location. So I ventured down into /etc/sysconfig/mdadm and found MDADM_CHECK_DURATION # Amount of time to spend checking md arrays each morning. # A check will start on the first Sunday of the month and run # for this long. If it does not complete, then it will be # continued each subsequent morning until all arrays have # been checked. Any string understood by "date --date=" # can be used. An empty string disables automatic checks. So taken in account that I have quite big disks on a relativly low performance system it is more that likely that this what I saw, and also the reason why the check gets interrupted. -- Regards Klaus
participants (5)
-
Adam Majer
-
Carlos E. R.
-
Dave Howorth
-
David C. Rankin
-
Klaus Vink Slott