[opensuse] mdraid array assemble problem
Hello: This occurs in openSUSE 12.2. I have several raid 1 devices most of them are 1.0 style. During boots the arrays are arbitrarily assembled. For example after first boot some of the arrays are assembled with one drive only. Even if I don't use/mount the devices and reboot the computer they are assembled differently after the new boot. It seems arbitrary which devices are assembled correctly and which not. It is arbitrary too which one of the mirror devices (/dev/sdbx or /dev/sdcx) become part of the array and which not. See the example: First boot: Personalities : [raid1] md1 : active (auto-read-only) raid1 sdb1[2] 20971520 blocks super 1.0 [2/1] [U_] md14 : active raid1 sdc14[2] sdb14[3] 62918468 blocks super 1.0 [2/2] [UU] md16 : active raid1 sdc16[2] sdb16[3] 62918468 blocks super 1.0 [2/2] [UU] md18 : active (auto-read-only) raid1 sdb18[0] 31455104 blocks super 1.0 [2/1] [U_] md8 : active raid1 sdc8[1] sdb8[0] 31438720 blocks super 1.2 [2/2] [UU] md9 : active raid1 sdc9[2] 62918468 blocks super 1.0 [2/1] [_U] md15 : active raid1 sdc15[2] sdb15[0] 62918468 blocks super 1.0 [2/2] [UU] md12 : active (auto-read-only) raid1 sdb12[0] 62918468 blocks super 1.0 [2/1] [U_] md11 : active (auto-read-only) raid1 sdb11[3] 62918468 blocks super 1.0 [2/1] [U_] md13 : active (auto-read-only) raid1 sdb13[3] 62918468 blocks super 1.0 [2/1] [U_] md10 : active raid1 sdb10[3] 62918468 blocks super 1.0 [2/1] [U_] md20 : active (auto-read-only) raid1 sdb20[0] 62918468 blocks super 1.0 [2/1] [U_] md22 : active (auto-read-only) raid1 sdb22[0] 55439552 blocks super 1.2 [2/1] [U_] md7 : active raid1 sdc7[3] 31455164 blocks super 1.0 [2/1] [_U] md6 : active raid1 sdb6[2] 20971520 blocks super 1.0 [2/1] [_U] md19 : active (auto-read-only) raid1 sdb19[0] 62918468 blocks super 1.0 [2/1] [U_] md17 : active (auto-read-only) raid1 sdb17[0] 62918468 blocks super 1.0 [2/1] [U_] md21 : active raid1 sdc21[2] sdb21[3] 48829464 blocks super 1.0 [2/2] [UU] unused devices: <none> You can see that in some /dev/mdx /dev/sdbx devices become part of the array, while in others /dev/sdbx devices become part of the array. If I restart the computer without touching any arrays, after next boot I get: Personalities : [raid1] md1 : active raid1 sdb1[2] sdc1[1] 20971520 blocks super 1.0 [2/2] [UU] md18 : active raid1 sdb18[0] sdc18[2] 31455104 blocks super 1.0 [2/2] [UU] md11 : active raid1 sdc11[2] sdb11[3] 62918468 blocks super 1.0 [2/2] [UU] md20 : active raid1 sdc20[2] sdb20[0] 62918468 blocks super 1.0 [2/2] [UU] md9 : active raid1 sdc9[2] 62918468 blocks super 1.0 [2/1] [_U] md16 : active raid1 sdb16[3] sdc16[2] 62918468 blocks super 1.0 [2/2] [UU] md21 : active raid1 sdc21[2] sdb21[3] 48829464 blocks super 1.0 [2/2] [UU] md6 : active raid1 sdb6[2] 20971520 blocks super 1.0 [2/1] [_U] md19 : active raid1 sdc19[2] sdb19[0] 62918468 blocks super 1.0 [2/2] [UU] md8 : active raid1 sdb8[0] sdc8[1] 31438720 blocks super 1.2 [2/2] [UU] md22 : active raid1 sdb22[0] sdc22[1] 55439552 blocks super 1.2 [2/2] [UU] md15 : active raid1 sdb15[0] sdc15[2] 62918468 blocks super 1.0 [2/2] [UU] md10 : active raid1 sdb10[3] 62918468 blocks super 1.0 [2/1] [U_] md7 : active raid1 sdc7[3] 31455164 blocks super 1.0 [2/1] [_U] md12 : active raid1 sdc12[2] sdb12[0] 62918468 blocks super 1.0 [2/2] [UU] md14 : active raid1 sdc14[2] sdb14[3] 62918468 blocks super 1.0 [2/2] [UU] md13 : active raid1 sdb13[3] sdc13[2] 62918468 blocks super 1.0 [2/2] [UU] md17 : active raid1 sdc17[2] sdb17[0] 62918468 blocks super 1.0 [2/2] [UU] unused devices: <none> A different assemble "pattern". And it goes like this from boot to boot. Why is this? How to fix this? Is this hardware of software issue? Thanks, Istvan -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Istvan Gabor wrote:
Hello:
This occurs in openSUSE 12.2. I have several raid 1 devices most of them are 1.0 style. During boots the arrays are arbitrarily assembled.
You ought to have messages in dmesg that'll tell you what's going on. "dmesg | grep md".
A different assemble "pattern". And it goes like this from boot to boot. Why is this? How to fix this? Is this hardware of software issue?
I've never seen that before - I am assuming those arrays were all correctly assembled before you rebooted? It doesn't seem to be hardware related, both your sdb and sdc are available. What do you have in mdadm.conf? -- Per Jessen, Zürich (3.8°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen írta:
Istvan Gabor wrote:
Hello:
This occurs in openSUSE 12.2. I have several raid 1 devices most of them are 1.0 style. During boots the arrays are arbitrarily assembled.
Thanks for answering.
You ought to have messages in dmesg that'll tell you what's going on. "dmesg | grep md".
OK, what I see in dmesg: there are missing "bind" lines for the arrays only with one device. See below.
A different assemble "pattern". And it goes like this from boot to boot. Why is this? How to fix this? Is this hardware of software issue?
I've never seen that before - I am assuming those arrays were all correctly assembled before you rebooted?
Not always assembled, but always synched. For example if every arrays are in sync and I reboot the computer, sometimes only one of the devices gets assembled into the array. If the array is not touched (mounted), after another reboot the same array sometimes gets assembled correctly.
It doesn't seem to be hardware related, both your sdb and sdc are available.
I think so too. I changed the drives' SATA ports, it did not solve the problem.
What do you have in mdadm.conf?
cat mdadm.conf DEVICE containers partitions ARRAY /dev/md1 metadata=1.0 name=pc:1 UUID=7c74cc31:c207e0fb:dc271c10:27ae48ef ARRAY /dev/md6 metadata=1.0 name=pc:6 UUID=21316afe:1a4dd0bf:50911056:88042a7c ARRAY /dev/md7 metadata=1.0 name=pc:7 UUID=64e23ea9:7dcb9ee2:7bca71bd:248cc5cf ARRAY /dev/md8 metadata=1.0 name=pc:8 UUID=440525f4:ac2dcae3:bedf85c8:7714b235 ARRAY /dev/md9 metadata=1.0 name=pc:9 UUID=43ccea22:8f57d370:414892d2:65509aff ARRAY /dev/md10 metadata=1.0 name=pc:10 UUID=d42689c4:8a0ab156:28e74353:7f0c04a1 ARRAY /dev/md11 metadata=1.0 name=pc:11 UUID=4ff5944f:b14c863f:9fddd970:c1ea68c7 ARRAY /dev/md12 metadata=1.0 name=pc:12 UUID=e5bab09a:55e8b0fd:e904c0c4:30665799 ARRAY /dev/md13 metadata=1.0 name=pc:13 UUID=6ec1c813:dfe7b88d:29f9dba0:ac5c860e ARRAY /dev/md14 metadata=1.0 name=pc:14 UUID=b441e33e:16c606a4:93c3cb81:19677ed7 ARRAY /dev/md15 metadata=1.0 name=pc:15 UUID=3fb56951:1ea9664b:b9e5dcc9:1285b262 ARRAY /dev/md16 metadata=1.0 name=pc:16 UUID=780efdb2:66692ec6:e7798d21:ecd9952a ARRAY /dev/md17 metadata=1.0 name=pc:17 UUID=537acb12:d79d664d:3d31e011:58d1e976 ARRAY /dev/md18 metadata=1.0 name=pc:18 UUID=ec974a7c:61e555ad:045471cf:b9be4ee2 ARRAY /dev/md19 metadata=1.0 name=pc:19 UUID=7b2c691a:f46dc203:d764d95e:f04270e7 ARRAY /dev/md20 metadata=1.0 name=pc:20 UUID=4b15ef7f:3cc8c8fe:90e185bd:648eb55e ARRAY /dev/md21 metadata=1.0 name=pc:21 UUID=4c137b77:2d67183c:c0716501:457c1e5d
Here is one proc/mdstat status and the corresponding messages log grepped for md: This this boot (previous shutdown) every arrays were synchronized and worked normally. Personalities : [raid1] md20 : active raid1 sdb20[0] sdc20[2] 62918468 blocks super 1.0 [2/2] [UU] md17 : active raid1 sdb17[0] sdc17[2] 62918468 blocks super 1.0 [2/2] [UU] md14 : active raid1 sdb14[3] sdc14[2] 62918468 blocks super 1.0 [2/2] [UU] md11 : active raid1 sdb11[3] sdc11[2] 62918468 blocks super 1.0 [2/2] [UU] md6 : active raid1 sdc6[3] sdb6[2] 20971520 blocks super 1.0 [2/2] [UU] md10 : active raid1 sdc10[2] sdb10[3] 62918468 blocks super 1.0 [2/2] [UU] md1 : active raid1 sdb1[2] sdc1[1] 20971520 blocks super 1.0 [2/2] [UU] md16 : active raid1 sdb16[3] sdc16[2] 62918468 blocks super 1.0 [2/2] [UU] md9 : active raid1 sdc9[2] sdb9[3] 62918468 blocks super 1.0 [2/2] [UU] md19 : active raid1 sdb19[0] sdc19[2] 62918468 blocks super 1.0 [2/2] [UU] md15 : active (auto-read-only) raid1 sdc15[2] 62918468 blocks super 1.0 [2/1] [_U] md13 : active raid1 sdb13[3] sdc13[2] 62918468 blocks super 1.0 [2/2] [UU] md21 : active (auto-read-only) raid1 sdb21[3] 48829464 blocks super 1.0 [2/1] [U_] md22 : active (auto-read-only) raid1 sdb22[0] 55439552 blocks super 1.2 [2/1] [U_] md18 : active (auto-read-only) raid1 sdb18[0] 31455104 blocks super 1.0 [2/1] [U_] md12 : active (auto-read-only) raid1 sdb12[0] 62918468 blocks super 1.0 [2/1] [U_] md7 : active raid1 sdc7[3] 31455164 blocks super 1.0 [2/1] [_U] md8 : active raid1 sdc8[1] 31438720 blocks super 1.2 [2/1] [_U] unused devices: <none> /var/log/messages grepped for md: Oct 21 09:39:42 linux systemd-readahe[286]: Bumped block_nr parameter of 8:0 to 16384. This is a temporary hack and should be removed one day. Oct 21 09:39:42 linux systemd-modules-load[303]: libkmod: kmod_config_parse: /etc/modprobe.d/10-unsupported-modules.conf line 10: ignoring bad line starting with 'allow_unsupported_modules' Oct 21 09:39:42 linux systemd-modules[303]: Inserted module 'microcode' Oct 21 09:39:42 linux kernel: [ 12.671684] md: bind<sdc8> Oct 21 09:39:42 linux kernel: [ 12.711804] md: bind<sdc7> Oct 21 09:39:42 linux kernel: [ 12.714298] md: bind<sdb12> Oct 21 09:39:42 linux kernel: [ 12.717932] md: bind<sdb18> Oct 21 09:39:42 linux kernel: [ 12.734882] md: bind<sdb22> Oct 21 09:39:42 linux kernel: [ 12.743972] md: bind<sdb21> Oct 21 09:39:42 linux kernel: [ 12.775977] md: bind<sdc13> Oct 21 09:39:42 linux kernel: [ 12.801950] md: bind<sdb13> Oct 21 09:39:42 linux kernel: [ 12.809952] md: bind<sdc15> Oct 21 09:39:42 linux kernel: [ 13.616693] md: raid1 personality registered for level 1 Oct 21 09:39:42 linux kernel: [ 13.616913] md/raid1:md13: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 13.616927] md13: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 13.618366] md13: unknown partition table Oct 21 09:39:42 linux kernel: [ 13.619713] md/raid1:md22: active with 1 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 13.619728] md22: detected capacity change from 0 to 56770101248 Oct 21 09:39:42 linux kernel: [ 13.621059] md/raid1:md12: active with 1 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 13.621074] md12: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 13.623270] md/raid1:md8: active with 1 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 13.623285] md8: detected capacity change from 0 to 32193249280 Oct 21 09:39:42 linux kernel: [ 13.625308] md: bind<sdc19> Oct 21 09:39:42 linux kernel: [ 13.626962] md/raid1:md7: active with 1 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 13.626981] md7: detected capacity change from 0 to 32210087936 Oct 21 09:39:42 linux kernel: [ 13.630297] md/raid1:md18: active with 1 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 13.630312] md18: detected capacity change from 0 to 32210026496 Oct 21 09:39:42 linux kernel: [ 13.635021] md/raid1:md21: active with 1 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 13.635037] md21: detected capacity change from 0 to 50001371136 Oct 21 09:39:42 linux kernel: [ 13.637146] md/raid1:md15: active with 1 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 13.637159] md15: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 13.639298] md8: unknown partition table Oct 21 09:39:42 linux kernel: [ 13.642340] md22: unknown partition table Oct 21 09:39:42 linux kernel: [ 13.648786] md7: unknown partition table Oct 21 09:39:42 linux kernel: [ 13.650793] md12: unknown partition table Oct 21 09:39:42 linux kernel: [ 13.657786] md15: unknown partition table Oct 21 09:39:42 linux kernel: [ 13.723658] md: bind<sdb9> Oct 21 09:39:42 linux kernel: [ 13.724677] md18: unknown partition table Oct 21 09:39:42 linux kernel: [ 13.890603] md: bind<sdc16> Oct 21 09:39:42 linux kernel: [ 13.892898] md: bind<sdc1> Oct 21 09:39:42 linux kernel: [ 13.909723] md21: unknown partition table Oct 21 09:39:42 linux kernel: [ 13.918602] md: bind<sdb10> Oct 21 09:39:42 linux kernel: [ 15.310295] md: md6 stopped. Oct 21 09:39:42 linux kernel: [ 15.310925] md: bind<sdb6> Oct 21 09:39:42 linux kernel: [ 15.311052] md: bind<sdc6> Oct 21 09:39:42 linux kernel: [ 15.329547] md/raid1:md6: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 15.329561] md6: detected capacity change from 0 to 21474836480 Oct 21 09:39:42 linux kernel: [ 15.331929] md6: unknown partition table Oct 21 09:39:42 linux kernel: [ 15.735486] md: md11 stopped. Oct 21 09:39:42 linux kernel: [ 15.736049] md: bind<sdc11> Oct 21 09:39:42 linux kernel: [ 15.736178] md: bind<sdb11> Oct 21 09:39:42 linux kernel: [ 15.794846] md/raid1:md11: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 15.794860] md11: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 15.795935] md11: unknown partition table Oct 21 09:39:42 linux kernel: [ 16.459724] md: md14 stopped. Oct 21 09:39:42 linux kernel: [ 16.460272] md: bind<sdc14> Oct 21 09:39:42 linux kernel: [ 16.460380] md: bind<sdb14> Oct 21 09:39:42 linux kernel: [ 16.484497] md/raid1:md14: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 16.484512] md14: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 16.486864] md14: unknown partition table Oct 21 09:39:42 linux kernel: [ 16.686390] md: md17 stopped. Oct 21 09:39:42 linux kernel: [ 16.687079] md: bind<sdc17> Oct 21 09:39:42 linux kernel: [ 16.687215] md: bind<sdb17> Oct 21 09:39:42 linux kernel: [ 16.733750] md/raid1:md17: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 16.733764] md17: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 16.735140] md17: unknown partition table Oct 21 09:39:42 linux kernel: [ 16.904649] md: md20 stopped. Oct 21 09:39:42 linux kernel: [ 16.905326] md: bind<sdc20> Oct 21 09:39:42 linux kernel: [ 16.905452] md: bind<sdb20> Oct 21 09:39:42 linux kernel: [ 16.912202] md/raid1:md20: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 16.912218] md20: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 16.914406] md20: unknown partition table Oct 21 09:39:42 linux kernel: [ 17.380278] md: bind<sdb19> Oct 21 09:39:42 linux kernel: [ 17.398572] md/raid1:md19: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 17.398591] md19: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 17.402213] md: bind<sdb16> Oct 21 09:39:42 linux kernel: [ 17.404176] md/raid1:md16: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 17.404189] md16: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 17.405749] md19: unknown partition table Oct 21 09:39:42 linux kernel: [ 17.447030] md16: unknown partition table Oct 21 09:39:42 linux kernel: [ 17.448922] md: bind<sdc9> Oct 21 09:39:42 linux kernel: [ 17.450649] md/raid1:md9: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 17.450665] md9: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 17.479782] md9: unknown partition table Oct 21 09:39:42 linux kernel: [ 17.487536] md: bind<sdb1> Oct 21 09:39:42 linux kernel: [ 17.590699] md/raid1:md1: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 17.590718] md1: detected capacity change from 0 to 21474836480 Oct 21 09:39:42 linux kernel: [ 17.677574] md: bind<sdc10> Oct 21 09:39:42 linux kernel: [ 17.679371] md/raid1:md10: active with 2 out of 2 mirrors Oct 21 09:39:42 linux kernel: [ 17.679386] md10: detected capacity change from 0 to 64428511232 Oct 21 09:39:42 linux kernel: [ 17.688739] md1: Oct 21 09:39:42 linux kernel: [ 17.690936] md10: unknown partition table Oct 21 09:39:42 linux kernel: [ 19.703418] EXT3-fs (md6): using internal journal Oct 21 09:39:42 linux kernel: [ 19.703421] EXT3-fs (md6): mounted filesystem with ordered data mode Oct 21 09:39:42 linux kernel: [ 19.875601] EXT3-fs (md7): using internal journal Oct 21 09:39:42 linux kernel: [ 19.875606] EXT3-fs (md7): mounted filesystem with ordered data mode Oct 21 09:39:42 linux kernel: [ 20.218686] EXT3-fs (md8): using internal journal Oct 21 09:39:42 linux kernel: [ 20.218688] EXT3-fs (md8): mounted filesystem with ordered data mode Oct 21 09:39:42 linux kernel: [ 20.425954] EXT3-fs (md9): using internal journal Oct 21 09:39:42 linux kernel: [ 20.425957] EXT3-fs (md9): mounted filesystem with ordered data mode Oct 21 09:39:42 linux kernel: [ 20.602757] EXT3-fs (md10): using internal journal Oct 21 09:39:42 linux kernel: [ 20.602759] EXT3-fs (md10): mounted filesystem with ordered data mode Thanks, Istvan -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Istvan Gabor wrote:
A different assemble "pattern". And it goes like this from boot to boot. Why is this? How to fix this? Is this hardware of software issue?
I've never seen that before - I am assuming those arrays were all correctly assembled before you rebooted?
Not always assembled, but always synched. For example if every arrays are in sync and I reboot the computer, sometimes only one of the devices gets assembled into the array. If the array is not touched (mounted), after another reboot the same array sometimes gets assembled correctly.
Very odd.
It doesn't seem to be hardware related, both your sdb and sdc are available.
I think so too. I changed the drives' SATA ports, it did not solve the problem.
What do you have in mdadm.conf?
cat mdadm.conf DEVICE containers partitions
Okay, looks good.
/var/log/messages grepped for md:
Do all your partitions have type FD (raid auto-detect) ? I'm not sure if it matters, but I always use that. -- Per Jessen, Zürich (10.4°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
22.10.2015 19:28, Per Jessen пишет:
Istvan Gabor wrote:
A different assemble "pattern". And it goes like this from boot to boot. Why is this? How to fix this? Is this hardware of software issue?
I've never seen that before - I am assuming those arrays were all correctly assembled before you rebooted?
Not always assembled, but always synched. For example if every arrays are in sync and I reboot the computer, sometimes only one of the devices gets assembled into the array. If the array is not touched (mounted), after another reboot the same array sometimes gets assembled correctly.
Very odd.
Not really. Device discovery is non-deterministic, both in timings and order. Current TW and 13.2 use assembly initiated from udev and timers to wait for complete array. If enough array pieces are available but not all, it will assemble it in degraded mode after timeer is expired. I do not know how 12.2 did it, but if it naively tried to activate array as soon as it saw it, this is exactly what is expected.
Do all your partitions have type FD (raid auto-detect) ? I'm not sure if it matters, but I always use that.
It does not matter; it kicks in only if md is built into kernel and it is module in openSUSE. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
22.10.2015 19:56, Andrei Borzenkov пишет:
22.10.2015 19:28, Per Jessen пишет:
Istvan Gabor wrote:
A different assemble "pattern". And it goes like this from boot to boot. Why is this? How to fix this? Is this hardware of software issue?
I've never seen that before - I am assuming those arrays were all correctly assembled before you rebooted?
Not always assembled, but always synched. For example if every arrays are in sync and I reboot the computer, sometimes only one of the devices gets assembled into the array. If the array is not touched (mounted), after another reboot the same array sometimes gets assembled correctly.
Very odd.
Not really. Device discovery is non-deterministic, both in timings and order. Current TW and 13.2 use assembly initiated from udev and timers to wait for complete array. If enough array pieces are available but not all, it will assemble it in degraded mode after timeer is expired.
I do not know how 12.2 did it, but if it naively tried to activate array as soon as it saw it, this is exactly what is expected.
Hmm ... looking at 12.2 boot.md, this is exactly what it did: # firstly finish any incremental assembly that has started. $mdadm_BIN -IRs Try adding some delay before this line, something like "sleep 60". -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Andrei Borzenkov írta:
22.10.2015 19:56, Andrei Borzenkov пишет:
22.10.2015 19:28, Per Jessen пишет:
Istvan Gabor wrote:
A different assemble "pattern". And it goes like this from boot to boot. Why is this? How to fix this? Is this hardware of software issue?
I've never seen that before - I am assuming those arrays were all correctly assembled before you rebooted?
Not always assembled, but always synched. For example if every arrays are in sync and I reboot the computer, sometimes only one of the devices gets assembled into the array. If the array is not touched (mounted), after another reboot the same array sometimes gets assembled correctly.
Very odd.
Not really. Device discovery is non-deterministic, both in timings and order. Current TW and 13.2 use assembly initiated from udev and timers to wait for complete array. If enough array pieces are available but not all, it will assemble it in degraded mode after timeer is expired.
I do not know how 12.2 did it, but if it naively tried to activate array as soon as it saw it, this is exactly what is expected.
Hmm ... looking at 12.2 boot.md, this is exactly what it did:
# firstly finish any incremental assembly that has started. $mdadm_BIN -IRs
Try adding some delay before this line, something like "sleep 60".
This fixed the problem. I experimented with lower sleep times too. It seems "sleep 10" is enough. Thanks! Istvan -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/21/2015 04:11 PM, Istvan Gabor wrote:
Hello:
This occurs in openSUSE 12.2. I have several raid 1 devices most of them are 1.0 style. During boots the arrays are arbitrarily assembled. For example after first boot some of the arrays are assembled with one drive only. Even if I don't use/mount the devices and reboot the computer they are assembled differently after the new boot. It seems arbitrary which devices are assembled correctly and which not. It is arbitrary too which one of the mirror devices (/dev/sdbx or /dev/sdcx) become part of the array and which not. See the example:
First boot:
Personalities : [raid1] md1 : active (auto-read-only) raid1 sdb1[2] 20971520 blocks super 1.0 [2/1] [U_]
md14 : active raid1 sdc14[2] sdb14[3] 62918468 blocks super 1.0 [2/2] [UU]
md16 : active raid1 sdc16[2] sdb16[3] 62918468 blocks super 1.0 [2/2] [UU]
md18 : active (auto-read-only) raid1 sdb18[0] 31455104 blocks super 1.0 [2/1] [U_]
<snip>
unused devices: <none>
A different assemble "pattern". And it goes like this from boot to boot.
Why is this? How to fix this? Is this hardware of software issue?
Thanks,
Istvan
What kernel are you running? There were a rash of problems in the early 4.0 kernels where several configurations of RAID1 and RAID5, etc. would not bind on boot leaving your array operating in degraded mode. The advice has generally been to "re ADD" the degraded disk to the array as long as the transactions were reasonably close (on the mdraid list that is somewhat a fuzzy number, but anything less than 40K transactions are worth a shot) Careful though. If this is random, then you must check that your good array is the one that binds (may have to be first) before the current disk that was running in degraded mode outside the array. (in other words, make sure the disk you re-add to the array is the one that has been in degraded mode, and didn't for some reason beyond modern comprehension, decide to attach first on your most recent boot. -- this can result in the array being sync'd to the degraded one -- which would have no small number of 'interesting' results. (I've survived that one, though recovery wasn't too bad) -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (4)
-
Andrei Borzenkov
-
David C. Rankin
-
Istvan Gabor
-
Per Jessen