[opensuse] Problem with a soft raid on system with 42.3 - contrasting information in yast and cat /proc/mdstat
I have created a raid with two dics time ago. The raid is a softraid in linux with RAID1. For quite some time it worked well. But now something is really wrong. The user is sometimes presented a "wrong" home with an older version. This comes and goes without understandable rule. The information on the RAID does not appear to be consistent. Yast reports: Dispositivo: • Dispositivo: /dev/md/homeraid • Dimensione: 931.51 GiB • Cifrato: No • ID del dispositivo 1: md-uuid-d9640ee4:3a9d7b72:68fa6b80:1b61dc7d ID del dispositivo 2: md-name-any:homeraid • Usato da 1: RAID: • Tipo di RAID: RAID1 • Dimensione blocco: • Algoritmi di parità: File system: • File system: Ext4 • Punto di montaggio: /home • Etichetta: But it reports the disc sdc as unused (it shouldn’t be reported like this, it should be active in RAID1 (homeraid). Non used: / dev / sdd1 Linux RAID To make this more puzzeling for me I have: # cat /proc/mdstat Personalities : [raid1] md127 : active raid1 sdc1[0] 976760640 blocks super 1.0 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk unused devices: <none> The whole problem did arise when the PC had a faulty PSU that did not start all device contemporaneously or at all, as it seams. So suddenly the user found her/himself with two expression of /home. One current, one old, somewhat odd looking (no background picture) of the same user. This has been repaired with a new PSU in the meanwhile. No smart errors, hardware seems all good. Now when restarting the machine, in 90% of the cases the right disc is used. When checking in partition manager of yast, you find that: when all is O.K. / dev / sdc1 is used as active. When things go South on you for unknown reasons, the active disc is sdd1 instead. Both should be the same….but aren’t. Sdd1 should not be named as unused in yast when sdc1 is used. And sdc1 and sdd1 should not flip within the RAID and should contain the very same information. What is wrong here? I do not understand. PS. in the BIOS both discs are AHCI on the same controller. RAID function of the BIOS is obviously not active, to allow the OS to do softraid. _________________________________________________________________ ________________________________________________________ Ihre E-Mail-Postf�cher sicher & zentral an einem Ort. Jetzt wechseln und alte E-Mail-Adresse mitnehmen! https://www.eclipso.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
stakanov wrote:
# cat /proc/mdstat Personalities : [raid1] md127 : active raid1 sdc1[0] 976760640 blocks super 1.0 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk
unused devices: <none>
This is clearly a degraded raid1 array. Given the number 127, it was auto-detected based on the super block.
The whole problem did arise when the PC had a faulty PSU that did not start all device contemporaneously or at all, as it seams. So suddenly the user found her/himself with two expression of /home. One current, one old, somewhat odd looking (no background picture) of the same user. This has been repaired with a new PSU in the meanwhile. No smart errors, hardware seems all good. Now when restarting the machine, in 90% of the cases the right disc is used. When checking in partition manager of yast, you find that: when all is O.K. / dev / sdc1 is used as active.
Uh, why is sdc1 being used - shouldn't it be /dev/md127 ?
When things go South on you for unknown reasons, the active disc is sdd1 instead.
So the disc enumeration changes?
Both should be the same….but aren’t. Sdd1 should not be named as unused in yast when sdc1 is used. And sdc1 and sdd1 should not flip within the RAID and should contain the very same information. What is wrong here? I do not understand.
Could you tell us which exact configuration you expect to see? -- Per Jessen, Zürich (2.2°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, Jan 9, 2019 at 11:33 AM stakanov
# cat /proc/mdstat Personalities : [raid1] md127 : active raid1 sdc1[0] 976760640 blocks super 1.0 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk
unused devices: <none>
...
When things go South on you for unknown reasons, the active disc is sdd1 instead.
Both should be the same….but aren’t. Sdd1 should not be named as unused in yast when sdc1 is used. And sdc1 and sdd1 should not flip within the RAID and should contain the very same information. What is wrong here? I do not understand.
It is obvious that now your RAID1 consists of single disk. Even if you repeat "should be the same" million times it is not going to change anything. You need to add second disk to RAID to fix it. mdadm --manage /dev/md127 --add /dev/sdd1 or whatever your second partition is. To be on safe side (because it is not clear in which state second partition is) I'd probably perform wipefs on this partition so it adds it as new and performs full copy. And no, I do not know how it happened. Apparently at some point you booted with only disk present. You did it two times, with two different disks. It resulted in removing second array member from each disk. At this point you had two disks each believing it is the only valid member of the same array. From now on it was the matter of which disk was detected first on boot. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, Jan 9, 2019 at 11:33 AM stakanov
wrote: # cat /proc/mdstat Personalities : [raid1] md127 : active raid1 sdc1[0]
976760640 blocks super 1.0 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk
unused devices: <none>
...
When things go South on you for unknown reasons, the active disc is sdd1 instead.
Both should be the same….but aren’t. Sdd1 should not be named as unused in yast when sdc1 is used. And sdc1 and sdd1 should not flip within the RAID and should contain the very same information. What is wrong here? I do not understand.
It is obvious that now your RAID1 consists of single disk. Even if you repeat "should be the same" million times it is not going to change anything. You need to add second disk to RAID to fix it.
mdadm --manage /dev/md127 --add /dev/sdd1
or whatever your second partition is. To be on safe side (because it is not clear in which state second partition is) I'd probably perform wipefs on this partition so it adds it as new and performs full copy.
And no, I do not know how it happened. Apparently at some point you booted with only disk present. You did it two times, with two different disks. It resulted in removing second array member from each disk. At this point you had two disks each believing it is the only valid member of the same array. From now on it was the matter of which disk was detected first on boot. This is very sensible, the faulty PSU may have decided (so to say) on the available electrical power. When it was insufficient to maintain alive the two it randomly switched off one of the discs..until it went down for good recently. I do see this machine only every 6 month and the user is not able to perform major actions on it. So I had to repair the PSU first and then stumbled on
In data mercoledì 9 gennaio 2019 12:14:43 CET, hai scritto: this while controlling the disks. How to perform wipefs on that disc (that is, sdd1)? su - wipefs -af /dev/sdd1 mdadm --manage /dev/md127 --add /dev/sdd1 Is this correct? Do I have to umont sdd1? It is "unused" so probably not mounted. If so, how to I have to remount it in order to add it to the RAID? Or would the mdadm command would just suffice? Do I have to tell the raid to mirror the data or does it automatically? How can I see when it did finish to mirror the data? Thank you. _________________________________________________________________ ________________________________________________________ Ihre E-Mail-Postf�cher sicher & zentral an einem Ort. Jetzt wechseln und alte E-Mail-Adresse mitnehmen! https://www.eclipso.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 01/09/2019 05:37 AM, stakanov wrote:
su - wipefs -af /dev/sdd1
You should not need to wipe the disk. mdadm is smart enough to use Update Time and Event count to determine that the disk is not the current disk and needs rebuilding.
mdadm --manage /dev/md127 --add /dev/sdd1
Fine, but make sure /dev/sdd1 is marked as failed. It may be worth adding the following to fail and remove /dev/sdd1 to force a resync when you add, e.g. likely output # mdadm /dev/md127 --fail /dev/sdd1 --remove /dev/sdd1 mdadm: set device faulty failed for /dev/sdd1: No such device Then you can add forcing the resync, e.g. # mdadm /dev/md127 --add /dev/sdd1 mdadm: re-added /dev/sdd1 No need to add --manage (it doesn't hurt, but it is presumed for --add). You then check the resync status with # cat /proc/mdstat It will show you the sync status.
Is this correct?
yes
Do I have to umont sdd1? It is "unused" so probably not mounted. If so, how to I have to remount it in order to add it to the RAID? Or would the mdadm command would just suffice? mdadm command is sufficient
Do I have to tell the raid to mirror the data or does it automatically? resync will be forced automatically
How can I see when it did finish to mirror the data? # cat /proc/mdstat
(it will show you which array is syncing and when it is done) See https://wiki.archlinux.org/index.php/RAID#Scrubbing I scrub monthly. (I used to scrub weekly, but 3T arrays will take about 5 hours) I just use a script in a crontab and log the results to a log file, e.g. $ bzcat /home/admin/log/mdadm_sync_valkyrie.bz2 | tail -n 12 Nov 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Nov 1 03:06:01 '/dev/md1' mismatch_cnt = 0 Nov 1 05:01:02 '/dev/md2' mismatch_cnt = 0 Nov 1 10:11:02 '/dev/md4' mismatch_cnt = 0 Dec 1 03:01:02 '/dev/md0' mismatch_cnt = 0 Dec 1 03:07:02 '/dev/md1' mismatch_cnt = 0 Dec 1 05:07:02 '/dev/md2' mismatch_cnt = 0 Dec 1 10:17:03 '/dev/md4' mismatch_cnt = 0 Jan 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Jan 1 03:07:01 '/dev/md1' mismatch_cnt = 0 Jan 1 05:03:02 '/dev/md2' mismatch_cnt = 0 Jan 1 10:13:02 '/dev/md4' mismatch_cnt = 0 Array sizes are: $ df -h | grep md[0-9][0-9]* /dev/md1 50G 21G 29G 43% / /dev/md0 469M 97M 360M 22% /boot /dev/md2 865G 487G 378G 57% /home /dev/md4 2.7T 871G 1.9T 32% /home/data (swap is on /dev/md3) Array status will be similar to: $ cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sda7[0] sdb7[1] 921030656 blocks super 1.2 [2/2] [UU] bitmap: 0/7 pages [0KB], 65536KB chunk md3 : active raid1 sdb8[1] sda8[0] 2115584 blocks super 1.2 [2/2] [UU] md1 : active raid1 sda6[0] sdb6[1] 52396032 blocks super 1.2 [2/2] [UU] md0 : active raid1 sdb5[1] sda5[0] 511680 blocks super 1.2 [2/2] [UU] md4 : active raid1 sdc[0] sdd[2] 2930135488 blocks super 1.2 [2/2] [UU] bitmap: 0/22 pages [0KB], 65536KB chunk unused devices: <none> -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
10.01.2019 2:46, David C. Rankin пишет:
On 01/09/2019 05:37 AM, stakanov wrote:
su - wipefs -af /dev/sdd1
You should not need to wipe the disk. mdadm is smart enough to use Update Time and Event count to determine that the disk is not the current disk and needs rebuilding.
mdadm (actually kernel driver) is not smart at all. If you boot with single array member and modify file system, then with another single array member and modify file system differently and finally with both members together it will happily assume both belong to clean array and are synchronized - causing data corruption. Just got it trying to reproduce this issue. @stakanov - could you please show mdadm --examine /dev/sdc1 mdadm --examine /dev/sdd1 before doing any changes? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 01/09/2019 10:55 PM, Andrei Borzenkov wrote:
mdadm (actually kernel driver) is not smart at all. If you boot with single array member and modify file system, then with another single array member and modify file system differently and finally with both members together it will happily assume both belong to clean array and are synchronized - causing data corruption. Just got it trying to reproduce this issue.
Yes, you are right, in that case there is no current active array (even an active degraded array) that holds a valid event count. What I was presuming, was that the raid1 array itself had been maintained, even if degraded, by loss of n-1 disks so that the remaining drive in the array would hold a valid event count that would control when the other devices were added back into the array. This remains true even if you reboot any number of times on a degraded array, so long as you don't lose or alter the final device in the array that holds the information for the array. If you have a 2-disk raid1 and effectively remove both from the array and modify the filesystems, then mdadm will not know which is the valid try to sync from. In that case, the only thing you could do to recover is to know which is the good disk, add it to a single-disk array and then add the remaining devices. The following will show the state of the array # mdadm -D /dev/md127 and then # mdadm -E /dev/sd[cd]1 will show the details. Based on the original: # cat /proc/mdstat Personalities : [raid1] md127 : active raid1 sdc1[0] 976760640 blocks super 1.0 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk unused devices: <none> /dev/sdc1 was considered the active and working device in the array. -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
In data giovedì 10 gennaio 2019 07:34:35 CET, David C. Rankin ha scritto:
On 01/09/2019 10:55 PM, Andrei Borzenkov wrote:
mdadm (actually kernel driver) is not smart at all. If you boot with single array member and modify file system, then with another single array member and modify file system differently and finally with both members together it will happily assume both belong to clean array and are synchronized - causing data corruption. Just got it trying to reproduce this issue.
Yes, you are right, in that case there is no current active array (even an active degraded array) that holds a valid event count. What I was presuming, was that the raid1 array itself had been maintained, even if degraded, by loss of n-1 disks so that the remaining drive in the array would hold a valid event count that would control when the other devices were added back into the array. This remains true even if you reboot any number of times on a degraded array, so long as you don't lose or alter the final device in the array that holds the information for the array.
If you have a 2-disk raid1 and effectively remove both from the array and modify the filesystems, then mdadm will not know which is the valid try to sync from. In that case, the only thing you could do to recover is to know which is the good disk, add it to a single-disk array and then add the remaining devices.
The following will show the state of the array
# mdadm -D /dev/md127
and then
# mdadm -E /dev/sd[cd]1
will show the details.
Based on the original:
# cat /proc/mdstat Personalities : [raid1] md127 : active raid1 sdc1[0] 976760640 blocks super 1.0 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk
unused devices: <none>
/dev/sdc1 was considered the active and working device in the array. Now a question, since we have all the bits and pieces. Assuming wipe (and that is what I asked before to understand if I was needing something else). If currently the Raid is mounted as /home. Do I have to umount the /home before joining the disc to the array or is it better to leave it mounted?
_________________________________________________________________ ________________________________________________________ Ihre E-Mail-Postfächer sicher & zentral an einem Ort. Jetzt wechseln und alte E-Mail-Adresse mitnehmen! https://www.eclipso.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
stakanov wrote:
Now a question, since we have all the bits and pieces. Assuming wipe (and that is what I asked before to understand if I was needing something else). If currently the Raid is mounted as /home. Do I have to umount the /home before joining the disc to the array or is it better to leave it mounted?
No, that is not needed. When the disk is hot-added, the array will start repairing right away, and /home will remain useable. -- Per Jessen, Zürich (-1.1°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
stakanov wrote:
Now a question, since we have all the bits and pieces. Assuming wipe (and that is what I asked before to understand if I was needing something else). If currently the Raid is mounted as /home. Do I have to umount the /home before joining the disc to the array or is it better to leave it mounted?
No, that is not needed. When the disk is hot-added, the array will start repairing right away, and /home will remain useable. Before joining the disk, can I also check the existing disk /dev/sdc1 on correctness or is this unnecessary? I assume fsck /dev/sdc1 as command but I guess I would have to indicate also the file system (or would it be identified automatically?) I recall the /dev/sdd had a "dirty bit" when I checked it with fsck (but it was "unused". As this one is wiped out no problem. But the other one would be
In data giovedì 10 gennaio 2019 08:04:40 CET, Per Jessen ha scritto: the template. So this is why I ask. _________________________________________________________________ ________________________________________________________ Ihre E-Mail-Postfächer sicher & zentral an einem Ort. Jetzt wechseln und alte E-Mail-Adresse mitnehmen! https://www.eclipso.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
stakanov wrote:
In data giovedì 10 gennaio 2019 08:04:40 CET, Per Jessen ha scritto:
stakanov wrote:
Now a question, since we have all the bits and pieces. Assuming wipe (and that is what I asked before to understand if I was needing something else). If currently the Raid is mounted as /home. Do I have to umount the /home before joining the disc to the array or is it better to leave it mounted?
No, that is not needed. When the disk is hot-added, the array will start repairing right away, and /home will remain useable.
Before joining the disk, can I also check the existing disk /dev/sdc1 on correctness or is this unnecessary? I assume fsck /dev/sdc1 as command but I guess I would have to indicate also the file system (or would it be identified automatically?)
Yes, fsck will automatically identify the filesystem, but given that your RAID1 is able to mount, there is no need to fsck. -- Per Jessen, Zürich (-0.7°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 01/10/2019 01:34 AM, stakanov wrote:
Before joining the disk, can I also check the existing disk /dev/sdc1 on correctness or is this unnecessary? I assume fsck /dev/sdc1 as command but I guess I would have to indicate also the file system (or would it be identified automatically?) I recall the /dev/sdd had a "dirty bit" when I checked it with fsck (but it was "unused". As this one is wiped out no problem. But the other one would be the template. So this is why I ask.
I would not run anything against /dev/sdc1 individually until your array is rebuilt. /dev/md127 is the block-device (the disk) for all practical purposes. You don't fsck individual disks in an array, you "scrub" the array for maintenance. See: RAID Maintenance https://wiki.archlinux.org/index.php/RAID#RAID_Maintenance -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2019-01-10 at 12:57 -0600, David C. Rankin wrote:
On 01/10/2019 01:34 AM, stakanov wrote:
Before joining the disk, can I also check the existing disk /dev/sdc1 on correctness or is this unnecessary? I assume fsck /dev/sdc1 as command but I guess I would have to indicate also the file system (or would it be identified automatically?) I recall the /dev/sdd had a "dirty bit" when I checked it with fsck (but it was "unused". As this one is wiped out no problem. But the other one would be the template. So this is why I ask.
I would not run anything against /dev/sdc1 individually until your array is rebuilt. /dev/md127 is the block-device (the disk) for all practical purposes. You don't fsck individual disks in an array, you "scrub" the array for maintenance. See:
Once I mounted a raid member without the raid. In this case, each member has differenc contents, so I would try to recover the files from the "faulty" member somehow. - -- Cheers, Carlos E. R. (from openSUSE 15.0 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXDeWeBwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVZOoAoIf9UC6DyoadFkjIdz8O kZ6GErJ7AJ0a06NAcvn6F6wTb/mYt0tpnT3nvA== =5kVO -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
In data giovedì 10 gennaio 2019 08:04:40 CET, Per Jessen ha scritto:
stakanov wrote:
Now a question, since we have all the bits and pieces. Assuming wipe (and that is what I asked before to understand if I was needing something else). If currently the Raid is mounted as /home. Do I have to umount the /home before joining the disc to the array or is it better to leave it mounted?
No, that is not needed. When the disk is hot-added, the array will start repairing right away, and /home will remain useable. And I am getting this warning from partition manager: some sub volumes of the file systems are replicated by mount points of other file systems and that could cause problems: home
So this should vanish after wiping dev/sdd1. Or do I have to edit something else? _________________________________________________________________ ________________________________________________________ Ihre E-Mail-Postfächer sicher & zentral an einem Ort. Jetzt wechseln und alte E-Mail-Adresse mitnehmen! https://www.eclipso.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 01/10/2019 03:16 AM, stakanov wrote:
And I am getting this warning from partition manager: some sub volumes of the file systems are replicated by mount points of other file systems and that could cause problems: home
So this should vanish after wiping dev/sdd1. Or do I have to edit something else?
Yes, After wiping dev/sdd1 (if you wipe the dev/sdd1 partition as well), you will need to recreate the partition before you add it back to the array. I would generally use sfdisk -d to dump the partition information from dev/sdc1 and then use that to repartition dev/sdd1 for use in the array. -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 01/10/2019 12:50 AM, stakanov wrote:
Now a question, since we have all the bits and pieces. Assuming wipe (and that is what I asked before to understand if I was needing something else). If currently the Raid is mounted as /home. Do I have to umount the /home before joining the disc to the array or is it better to leave it mounted?
No, The array is mounted as home (e.g. /dev/md127). You can add/fail/remove disk from the array without worrying about how it is mounted. -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 01/10/2019 12:50 AM, stakanov wrote:
Now a question, since we have all the bits and pieces. Assuming wipe (and that is what I asked before to understand if I was needing something else). If currently the Raid is mounted as /home. Do I have to umount the /home before joining the disc to the array or is it better to leave it mounted? No,
The array is mounted as home (e.g. /dev/md127). You can add/fail/remove disk from the array without worrying about how it is mounted. So, thank you all for your advice. Yes, also Istvan was quite right about that eventuality that the user was writing here and there. In this case, the important (and different) home content in what is documents etc., are on a dedicated "post user" account. Then there is the "surf and daddle" one. That one was somewhat a problem but what was really important was, that the post account had another background thus, was distinguishable. In 99% of all boots
In data giovedì 10 gennaio 2019 19:53:59 CET, David C. Rankin ha scritto: the user rebooted until that account was mounted. So I did a backup for the sake of trying to minimize losses and then I did follow the procedure that did work well (wipe and then integrate the disk). What I would like to ask as a last bit: is there an app in Plasma / KDE that allows to monitor the status of the RAID? If the user would have had this, I would have had an easier life to find out what was happening well before. So if there should be one, I would be happy to set it up to give a visual control to the user when such an event occurs. _________________________________________________________________ ________________________________________________________ Ihre E-Mail-Postfächer sicher & zentral an einem Ort. Jetzt wechseln und alte E-Mail-Adresse mitnehmen! https://www.eclipso.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 01/11/2019 01:40 AM, stakanov wrote:
What I would like to ask as a last bit: is there an app in Plasma / KDE that allows to monitor the status of the RAID? If the user would have had this, I would have had an easier life to find out what was happening well before. So if there should be one, I would be happy to set it up to give a visual control to the user when such an event occurs.
Not that I know of. mdadm.conf requires an e-mail address associated with the MAILADDR setting to send an e-mail when a failure is detected. (when mdadm.servic is started) You also have the option to set a PROGRAM to launch in case of failure that you could use to display a dialog, etc.. You can also write your own script to check each array using 'mdadm -D` and check 'Active Devices', 'Working Devices' and 'Failed Devices'. When I loop over the arrays to scrub them, I use a grep of /proc/mounts to pick out each of the arrays -- something similar to: grep -o '^/dev/md[0-9][0-9]*' /proc/mounts I just feed a while loop with a process substitution to loop over each of the devices. However, it may be worth checking on the linux-raid@vger.kernel.org mailing list to see if they may be aware of any similar utility that is already written. -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, 09 Jan 2019 09:33:36 +0100, stakanov wrote:
I have created a raid with two dics time ago. The raid is a softraid in linux with RAID1. For quite some time it worked well. But now something is really wrong. The user is sometimes presented a "wrong" home with an older version. This comes and goes without understandable rule. The information on the RAID does not appear to be consistent.
[long snip] Hello: Some suggested that you recover the degraded array using mdadm, and mdadm will find out which disk of the array should be the source and the target (based on time stamps). I would not do this unless you are sure that the degraded array disk you used during different boots was always the same one. Here it is why. Something similar happened to me a few years ago, a user discovered that stuff disappeared he saved at previous time, but after another boot it reappeared and other stuff disappeared. It turned out that during the raid assembly the system could not find one of the disks and started the array in degraded mode. The user did not notice this, because the directories were normal. But at next boot the system could not find the other disk and again started the array in degraded mode, but with the disk which wasn't used the last time. The user noticed that stuff disappeared but kept working and saved stuff. That is, both disks of the array hold important stuff that wasn't on the other disk. I cannot recall what exactly happened and why, but the result was that both disks of the array had important stuff the that the other disk of the array didn't have. I solved it by - disassembling the array and removing one disk from the array - setting a new array with removed disk in degraded mode - starting both arrays (the original and the new one) in degraded mode (running both arrays with only one disk in degraded mode in order to be able to mount them to different mount points) - mounting both arrays to different mount points - comparing the content and copying different stuff between the two degraded arrays to make the original array have everything - unmounting and disassembling the new array - rebuilding the original array by adding back the other disk (which was removed at the start) This finally resulted in working array that had everything that was saved when the array ran in degraded mode with either of the disks. Istvan -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (6)
-
Andrei Borzenkov
-
Carlos E. R.
-
David C. Rankin
-
Istvan Gabor
-
Per Jessen
-
stakanov