Mailinglist Archive: opensuse (911 mails)

< Previous Next >
Re: [opensuse] strange mdraid problem
Hi All,

Just to let a little contribution here:

Istvan, if your array contains any kind of Write-intent bitmap, as
most created by YaST possess, the solution provided David alone will
not (i think) rebuild your whole array, but only the "missing" sectors
that where changed since it was last removed. You can check if your
array has Write-intent bitmap by issuing:
~# cat /proc/mdstat

Then observe the output. Here is an example of two arrays, one WITHOUT
bitmap (/dev/md0), and another WITH bitmap (/dev/md2).
md0 : active raid1 sdb1[3] sda1[2]
127988 blocks super 1.0 [2/2] [UU]

md2 : active raid1 sdb3[3] sda3[2]
1462913296 blocks super 1.0 [2/2] [UU]
bitmap: 11/11 pages [44KB], 65536KB chunk
See the bitmap line on md2? That means you have a Write-intent bitmap
associated with that array.
If you do not have any Write-intent bitmap, David solution will
rebuild your whole array.

If not, you have to solutions:
1) Remove the Write-intent bitmap from the array, by issuing:
~# mdadm --grow -b none </dev/md?> (replace question mark)
Then, carry out David instructions.
You may than re-add the bitmap, with:
~# mdadm --grow -n internal
NOTE: This will create an internal Write-intent bitmap

2) Carry out David instructions (if you think the array did not
startup in the order you wanted it to, according to David's
explanations).
Then issue a full rebuild on the array, by issuing:
~# echo repair >> /sys/block/md?/md/sync_action (replace question mark)

Hope this helps,
Best regards,
Rui


On Tue, Dec 8, 2015 at 5:35 AM, David C. Rankin
<drankinatty@xxxxxxxxxxxxxxxxxx> wrote:
On 12/07/2015 11:52 AM, Istvan Gabor wrote:

OK. How do I know that only one device is visible? I have several arrays
and
both devices are visible and assembled in the resynched group. What you're
saying is that in case of ~10 arrays (all have been resynched after the
failure) both devices are always visible (dev/sda* and /dev/sdb*) but in
case
of not synched (and only in not synched) either /dev/sda* or /dev/sdb* is
visible alternatively at different boots. How can I confirm that this
causes
the problem?


Istvan,

I had a similar issue where I had a disk controller that was flaky. I
still do not know exactly how it happened, but apparently on one boot, the
array booted into degraded mode and did not see the other disk. When that
occurred, it continued to write to the good disk as it normally would. On
next boot, the other disk re-appeared and dmraid was stuck. It saw both
metadata saying they were fine and if the event counts are not that far off,
it doesn't know which is the good disk. It should kick one out and continue
on the one with the most recent event.

To recover, you 'fail' and 'remove' the bad device (or the one dmraid
thinks is bad), Make sure you fail the *correct* partition, e.g.:

# mdadm /dev/md1 --fail /dev/sdb5 --remove /dev/sdb5
mdadm: set device faulty failed for /dev/sdb5: No such device

*note:* since mdadm has already kicked the drive, you will receive the 'No
such device' warning above (this is normal).

Then re-'add' the device:

# mdadm /dev/md1 --add /dev/sdb5
mdadm: re-added /dev/sdb5

That will start the resync. Good luck.

--
David C. Rankin, J.D.,P.E.

--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: opensuse+owner@xxxxxxxxxxxx




--
Rui Santos
Veni, Vidi, Linux
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: opensuse+owner@xxxxxxxxxxxx

< Previous Next >