[Bug 793954] New: raid1 array sometimes started with one of two disks at boot (again)
https://bugzilla.novell.com/show_bug.cgi?id=793954 https://bugzilla.novell.com/show_bug.cgi?id=793954#c0 Summary: raid1 array sometimes started with one of two disks at boot (again) Classification: openSUSE Product: openSUSE Factory Version: 12.3 Milestone 1 Platform: Other OS/Version: openSUSE 12.2 Status: NEW Severity: Major Priority: P5 - None Component: Basesystem AssignedTo: nfbrown@suse.com ReportedBy: suse-beta@cboltz.de QAContact: qa-bugs@suse.de Found By: Beta-Customer Blocker: --- Created an attachment (id=516596) --> (http://bugzilla.novell.com/attachment.cgi?id=516596) boot.log (factory from 2012-12-03, mdadm-3.2.6-2.1) It seems the random "start raid array with only one of two disks" problem is back in Factory :-( Details: I have two harddisks in my laptop. Each of them has several partitions, which are paired as RAID 1 (mirroring) arrays. In the last days, my /boot partition (md0) was started with one of two disks (IIRC this happened twice). Today my /testroot partition (md3) started with only one active disk. dmesg does not show any sign of broken disks. At least for /boot, re-adding the second disk with mdadm worked without any complaints (and in the next boots, it came up with both disks), and I expect the same for /testroot. I'll attach boot.log and dmesg output with md3 starting with only one disk. (Sorry, no /v/l/messages because my syslog setup is broken :-( ) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c
Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c1
--- Comment #1 from Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c2
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c3
Christian Boltz
that sdb6 had failed previously and was not included for that reason - perfectly correct.
Then it probably failed on the previous boot.
Why it failed I cannot tell without /var/log/messages.
And that's something I can't provide because syslog is broken on my system :-( Anyway - thanks for checking this. If this happens again and I have more useful logs, I'll report back. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c4
Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c5
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c6
--- Comment #6 from Christian Boltz
Hopefully you didn't wait for me and have already re-integrated the missing disk into the array. If not, feel free to do so.
I'm just doing it _again_ :-( - this time sdb3 was missing from md1 (are you interested in fresh logs? - note to myself: 2013-02-09T11:39:33) Besides your ideas what could go wrong, would an unclear umount/shutdown be an explanation for what I'm seeing? md1 is my /home partition and also contains /var (/var is a symlink to /home/sys-var/). I'm quite sure I've seen umounting md1 failing at shutdown because it's still in use, but I don't know if this happens always or only sometimes (and it's probably too late to be logged). BTW: The resync is usually extremely fast (120 G in 2 minutes) - is there a "list of changed sectors" that avoids a full resync? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c7
--- Comment #7 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c8
--- Comment #8 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c9
--- Comment #9 from Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c10
--- Comment #10 from Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c11
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c12
--- Comment #12 from Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c13
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c14
Christian Boltz
Could you add "-vv" to the mdadm command in /etc/init.d/boot.md
Done.
Am I correct in thinking that this never happens to the "root" array, only to other arrays what would be assembled once the system has booted?
IIRC you are correct (but I'm not 100% sure). When first repoting the problem, it affected /boot and /testroot, but in the last months typically my encrypted /home (which also contains /var via symlinks) breaks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c15
--- Comment #15 from Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c16
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c17
--- Comment #17 from Christian Boltz
I missed a spot where it can fail with EBUSY.
No problem ;-) I just installed the updated package. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c18
--- Comment #18 from Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c19
--- Comment #19 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c20
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c21
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c22
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c23
Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c24
--- Comment #24 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c25
--- Comment #25 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c26
--- Comment #26 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c27
--- Comment #27 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c28
--- Comment #28 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c29
--- Comment #29 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c30
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c31
Frederic Crozat
Ahh.. I might have it at last
This bug is very similar to bug 772286 which we fixed by adding # Should-Start: udev-trigger to /etc/init.d/boot.md.
However it seems that udev-trigger isn't a service any more. Rather there is a
systemd-udev-trigger.service
So maybe we just need to change "udev-trigger" in boot.md to "systemd-udev-trigger". A bit of experimentation suggests that changing
# Should-Start: boot.scsidev boot.multipath udev-trigger
in /etc/init.d/boot.md to
# Should-Start: boot.scsidev boot.multipath systemd-udev-trigger
does seem to change the behaviour as I would expect. I don't have any problems with assembling arrays (as the problem is very sensitive to particular timing of various events) so I cannot be certain that this fixes the problem. If someone who does experience the problem could make this change and see if it fixes the problem, that would be great.
Frederic: apart from wanting to let you know that I'm a bit grumpy about this name change (not your fault exactly, but I wanted to complain to someone) I notice that the /usr/lib/systemd/system/systemd-udev-trigger.service file looks wrong based on the man page (and a quick look at the code). It contains:
ExecStart=/usr/bin/udevadm trigger --type=subsystems --action=add ; /usr/bin/udevadm trigger --type=devices --action=add
However the body of "ExecStart" is not a shell script, but a command and some args, so this passes ";" and "/usr/bin/udevadm" etc as extra args to /usr/bin/udevadm.
Indeed. Splitting with ; works fine as long as the first binary doesn't take an argument, otherwise, it is lost.
For "Type=oneshot" services you are allowed multiple ExecStart, so this should be
ExecStart=/usr/bin/udevadm trigger --type=subsystems --action=add ExecStart=/usr/bin/udevadm trigger --type=devices --action=add
Looks better indeed.
This change might also be needed for md arrays to be started properly but I'm not 100% sure.
The upstream code has this bug. Would you follow it up with upstream, or would you rather that I did?
Feel free to send a patch upstream, I'll backport it once it is approved -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c32
--- Comment #32 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c33
--- Comment #33 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c34
--- Comment #34 from Frederic Crozat
Btw is it possible to get the boot console(!) log into a file? Setting ENFORCE_BLOGD="yes" in sysconfig/boot does nothing. There is no boot.msg file.
Use journalctl -b to get all the logs from current boot.
So it looks a bit like you want to get that broken udev file (and its name in boot.md) fixed in 12.3 yesterday...
We'll ship a maintenance update on relevant distributions once we are sure all the fixes are tested, of course :) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c35
--- Comment #35 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c36
--- Comment #36 from Frederic Crozat
journalctl: Command not found.
sorry, I forgot you are on 12.2, use systemd-journalctl instead.
Everyone's interested in the latest, but I have to test this on oS 12.1 for the time being. (Do I put this down to yet another case of half-baked banana software like systemd being let lose on the guinee pigs ooops opensuse users...??? :-)
These kind of comments won't really help motivating people to help you.. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c37
--- Comment #37 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c38
--- Comment #38 from Frederic Crozat
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c39
--- Comment #39 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c40
--- Comment #40 from Frederic Crozat
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c41
--- Comment #41 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c42
--- Comment #42 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c43
--- Comment #43 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c44
--- Comment #44 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c45
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c46
--- Comment #46 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c47
--- Comment #47 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c48
--- Comment #48 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c49
Benjamin Brunner
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c50
--- Comment #50 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c51
--- Comment #51 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c52
--- Comment #52 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c53
Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c54
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c55
Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c56
--- Comment #56 from Volker Kuhlmann
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c57
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c58
Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c59
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c60
--- Comment #60 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c61
--- Comment #61 from Norbert Jurkeit
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c62
Benjamin Brunner
https://bugzilla.novell.com/show_bug.cgi?id=793954
https://bugzilla.novell.com/show_bug.cgi?id=793954#c63
--- Comment #63 from Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com