[Bug 805420] New: systemd doesn't order dependencies -- allowing localfs to exec before lvm finishes finding volumes
https://bugzilla.novell.com/show_bug.cgi?id=805420 https://bugzilla.novell.com/show_bug.cgi?id=805420#c0 Summary: systemd doesn't order dependencies -- allowing localfs to exec before lvm finishes finding volumes Classification: openSUSE Product: openSUSE 12.3 Version: RC 1 Platform: x86-64 OS/Version: SUSE Other Status: NEW Severity: Major Priority: P5 - None Component: Basesystem AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: suse@tlinx.org QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.28) Gecko/20120306 Firefox/3.6.28 In a boot recently, I noted that lvm took a bit longer than normal and systemd didn't wait for it. It attempted to mount the local file systems -- only succeeding on those that were regular hard disk partitions. ... net.core.somaxconn = 8192 net.core.netdev_max_backlog = 3000000 net.ipv4.conf.eth0.proxy_arp = 1 net.ipv4.conf.eth1.proxy_arp = 1 net.ipv4.ip_forward = 0 done <notice -- Feb 17 11:38:57.933526000> service boot.sysctl done Starting udevd: done Loading drivers, configuring devices: <notice -- Feb 17 11:38:57.956211000> service boot.usr-mount done <--------------------/usr being mounted at boot done <notice -- Feb 17 11:39:00.38615000> service boot.udev done <notice -- Feb 17 11:39:00.39151000> service boot.loadmodules start <notice -- Feb 17 11:39:00.40231000> service boot.rootfsck start Loading required kernel modules done <notice -- Feb 17 11:39:00.61188000> service boot.loadmodules done Activating swap-devices in /etc/fstab... doneroot file system (/) is NOT being checked. <notice -- Feb 17 11:39:00.107218000> service boot.rootfsck done <notice -- Feb 17 11:39:00.111265000> service boot.cgroup start <notice -- Feb 17 11:39:00.120248000> service boot.clock start mounting cgroup file systems... cpuset debug cpu cpuacct devices freezer blkio net_priodone <notice -- Feb 17 11:39:00.257108000> service boot.cgroup done Set System Time to the current Hardware Clockdone <notice -- Feb 17 19:39:00.334317000> service boot.clock done <notice -- Feb 17 19:39:00.334838000> service boot.device-mapper start <notice -- Feb 17 19:39:00.340342000> service boot.localfs start <<localfs supposedly starts before devmapper finishes Activating device mapper...done <notice -- Feb 17 19:39:00.352882000> service boot.device-mapper done Waiting for /dev/Home+Space/Home /dev/Home+Space/Share /dev/Home+Space/Home.diff /dev/Media/Media /dev/Home+Space/Squid_Cache /dev/Backups/Backups /dev/Home+Space/Media_Back .............................. timeout! Checking file systems... fsck from util-linux 2.20.1 donedone Mounting local file systems... <<< try to mount local fs mount: /dev/sdc6 already mounted on /usr /dev/sdc2 on /var type xfs (rw,noatime,nodiratime,swalloc,largeio,logbsize=256k) /dev/sdc3 on /boot type xfs (rw,noatime,nodiratime,swalloc,largeio,allocsize=1m) /var/rtmp on /tmp type none (rw,bind) mount: special device /dev/Home+Space/Home does not exist mount: special device /dev/Home+Space/Share does not exist mount: special device /dev/Home+Space/Home.diff does not exist mount: special device /dev/Media/Media does not exist mount: special device /dev/Home+Space/Squid_Cache does not exist mount: special device /dev/Backups/Backups does not exist mount: mount point /backups/Media does not exist mount: special device /home/share does not exist /Media on /Media type none (rw,bind) failed <notice -- Feb 17 19:39:32.829649000> service boot.localfs done << localfs finish <notice -- Feb 17 19:39:32.830172000> service boot.cleanup start <notice -- Feb 17 19:39:32.842134000> service boot.klog start <notice -- Feb 17 19:39:32.845642000> service boot.localnet start <notice -- Feb 17 19:39:32.846799000> service boot.lvm start << !?!?! now boot.lvm?? <notice -- Feb 17 19:39:32.852691000> service setserial start <notice -- Feb 17 19:39:32.913597000> service setserial done <notice -- Feb 17 19:39:33.81793000> service boot.isapnp start <notice -- Feb 17 19:39:33.107182000> service boot.isapnp done <notice -- Feb 17 19:39:33.418268000> service boot.localnet done <notice -- Feb 17 19:39:33.442066000> service boot.cleanup done <notice -- Feb 17 19:39:33.442210000> service boot.sysstat start <notice -- Feb 17 19:39:33.612498000> killproc: kill(165,29) adding: boot-130217.1939 (deflated 79%) test of /var/log/boot.msg.zip OK done <notice -- Feb 17 19:39:33.613730000> service boot.klog done Running sadc done Cannot open font file lat1-08.psfu.gz /etc/init.d/boot.sysstat: line 55: /bin/unicode-start: No such file or directory FATAL: Module bond0 not found. Waiting for udev to settle... <<<< wait for udev to settle... Scanning for LVM volume groups... <<< now lvm scans for groups.... Reading all physical volumes. This may take a while... Found volume group "Media" using metadata type lvm2 Found volume group "Backups" using metadata type lvm2 Found volume group "Home+Space" using metadata type lvm2 Activating LVM volume groups... 1 logical volume(s) in volume group "Media" now active <notice -- Feb 17 19:39:33.819432000> service boot.sysstat done 1 logical volume(s) in volume group "Backups" now active 20 logical volume(s) in volume group "Home+Space" now active done <notice -- Feb 17 19:39:37.442276000> service boot.lvm done <<< boot.lvm done <notice -- Feb 17 19:39:37.449508000> service boot.swap start Activating remaining swap-devices in /etc/fstab... done <notice -- Feb 17 19:39:37.475149000> service boot.swap done <notice -- Feb 17 19:39:37.490474000> service boot.ldconfig start <notice -- Feb 17 19:39:37.503860000> service boot.ldconfig done System Boot Control: The system has been set up System Boot Control: Running /etc/init.d/boot.local Cannot open font file lat1-08.psfu.gz /etc/init.d/boot.local: line 29: unicode-start: command not found failed INIT: Entering runlevel: Master Resource Control: previous runlevel: N, switching to runlevel: 3 Reproducible: Always Steps to Reproduce: 1. slow down lvm somehow -- suggest opening multiple active snapshots (~5-6 should do it)....I had 1 active snap going at the time) 2. 3. Actual Results: lvm local file systems don't get mounted Expected Results: expect systemd to pay attention to pre-requisites at least as well as systemV bootup scripts -- should be better...it's more advanced. Note -- My system boots off of it's hard disk -- not a ram disk. Needed drivers are built-in. Additional note -- in order for /usr to be mounted at boot, 'mount' and any needed libraries need to reside in /bin and not be moved. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805420 https://bugzilla.novell.com/show_bug.cgi?id=805420#c1 L. A. Walsh <suse@tlinx.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Basesystem |Basesystem Version|RC 1 |Final Product|openSUSE 12.3 |openSUSE 12.1 Target Milestone|--- |Final --- Comment #1 from L. A. Walsh <suse@tlinx.org> 2013-02-26 08:09:36 PST ---
From as near as I can tell, the rpm's for systemd are 12.1 rpms. As for what boot process is used -- what started this was that my boot.local script was no longer being called and I couldn't toggle it on because it didn't exist as a service in systemd: chkconfig boot.local on insserv: Note: sysvinit service boot.local is shadowed by systemd local.service, Forwarding request to '/bin/systemctl --root / enable local.service'. Operation failed: No such file or directory
From there, I discovered that systemd doesn't seem to handle the actual HW boot as it doesn't seem to handle dependencies reliably. It appears more like in 12.[123], the systemd boot scripts have been moved to the initrd -- which would be one strong reason for my being told that we *had* to have an initrd -- so the SystemV equivalent boot scripts could be run before systemd started.
If this isn't the case, why doesn't lvm start (& finish) before local filesystems are mounted, for example. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805420 https://bugzilla.novell.com/show_bug.cgi?id=805420#c2 Frederic Crozat <fcrozat@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fcrozat@suse.com --- Comment #2 from Frederic Crozat <fcrozat@suse.com> 2013-03-11 16:46:10 UTC --- please reboot with "systemd.log_level=debug systemd.log_target=kmsg" on kernel command line and attach dmesg ouput on this bug report. Please also attach /etc/fstab and output of systemctl status lvm.service systemctl show lvm.service systemctl status rc-local.service -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805420 https://bugzilla.novell.com/show_bug.cgi?id=805420#c3 L. A. Walsh <suse@tlinx.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|systemd doesn't order |12.1 boot doesn't correctly |dependencies -- allowing |order dependencies: |localfs to exec before lvm |specifically localfs has no |finishes finding volumes |dependency on lvm --- Comment #3 from L. A. Walsh <suse@tlinx.org> 2013-03-11 11:14:56 PDT --- I traced this down some more, and it looks like the problem is in the /etc/init.d/boot.localfs script where boot.lvm (among others) is listed under 'Should-start' instead of required-start. Apparently this allows the boot-time software to run boot.localfs in parallel with the *start* of boot.lvm (whereas boot.lvm should be finished before boot.localfs). It looks like maybe the should-starts were designed to have been started and finished before boot.localfs runs, but instead, they are simply being scheduled for 'start'. The device-mapper is part of the lvm package... it only starts .07s before localfs tries to start. <notice -- Feb 17 19:39:00.334838000> service boot.device-mapper start <notice -- Feb 17 19:39:00.340342000> service boot.localfs start Activating device mapper...done <notice -- Feb 17 19:39:00.352882000> service boot.device-mapper done Waiting for /dev/Home+Space/Home /dev/Home+Space/Share /dev/Home+Space/Home.diff /dev/Media/Media /dev/Home+Space/Squid_Cache /dev/Backups/Backups /dev/Home+Space/Media_Back .............................. timeout! ====trimmed beginning of lines:... <39:32.829649000> service boot.localfs done <39:32.830172000> service boot.cleanup start <39:32.842134000> service boot.klog start <39:32.845642000> service boot.localnet start <39:32.846799000> service boot.lvm start <39:37.442276000> service boot.lvm done --- Noticed that boot.lvm isn't even started until after boot.localfs is done. and that boot.lvm takes almost 5 seconds by itself to finish. ===================== I confused the problem with systemd, as whenever I tried to change/insert boot scripts, I got the message that the boot scripts were shadowed by systemd and that the request(s) for changes was handled by it. It turns out, AFAICT, that message is bogus. ----------------- Carrying this forward -- as systemd does replace these scripts -- it needs to be sure it waits until all the local disks are mounted before it starts. systemd should not assume that all disks are mounted when it starts unless it *checks* for them (i.e. it has targets that can check for these things and they should be used). It shouldn't make the assumption that something else handled everything before it was loaded -- else the same problems can occur. Note, I've seen lvm take over 15-20 seconds to perform volume building when it had multiple (3-4) snapshots going ON the same volume and >30 seconds when more than 5-6 snapshots are active. Seems to go up geometrically or exponentially, not sure which. I don't do such because of the problems it causes, but even 1 active snap -- which well falls into normal usage can cause a 5 second delay... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805420 https://bugzilla.novell.com/show_bug.cgi?id=805420#c4 --- Comment #4 from L. A. Walsh <suse@tlinx.org> 2013-03-11 11:24:03 PDT --- oops, it sent the result on this before I was done... I put boot.lvm on the localfs's "required-start" line, and the problem went away. Also, regarding mount of /usr, before systemd runs full out, I did this: /etc/systemd/system> ll default.target.wants/ systemd-readahead-collect.service -> /lib/systemd/system/systemd-readahead-collect.service systemd-readahead-replay.service -> /lib/systemd/system/systemd-readahead-replay.service usr-mount.service -> /lib/systemd/system/usr-mount.service =============== more /lib/systemd/system/usr-mount.service # This file is linda's # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. [Unit] Description=mount /usr on root early in systemd startup DefaultDependencies=no Conflicts=shutdown.target Before=sysinit.target shutdown.target ConditionPathExists=/bin [Service] Type=oneshot ExecStart=/lib/systemd/systemd-usr RemainAfterExit=no StandardOutput=syslog+console TimeoutSec=5 [Install] WantedBy=default.target ======================================================= Such a target in a systemd-only boot env (and same for any other needed boot modules -- as I'm told all of them are not being integrated into systemd, which is not good), should allow it to boot. Basically anything that systemd doesn't do -- won't happen if the user boots from their hard disk instead of a ram disk. Hopefully some people will get that such is wanted. My system boot time is about 20-25 seconds total -- and that's for something I use as a server -- not a laptop. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805420 https://bugzilla.novell.com/show_bug.cgi?id=805420#c5 Frederic Crozat <fcrozat@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |suse@tlinx.org --- Comment #5 from Frederic Crozat <fcrozat@suse.com> 2013-03-12 09:54:27 UTC --- please anwers my questions from comment 2 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805420 https://bugzilla.novell.com/show_bug.cgi?id=805420#c6 L. A. Walsh <suse@tlinx.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|suse@tlinx.org | --- Comment #6 from L. A. Walsh <suse@tlinx.org> 2013-03-12 22:57:28 PDT --- Upon rebooting my system, I ran into difficulties which, if related, are only related from the standpoint of overall incompatibilities being introduced in 12.3. I'm beginning to see a trend -- other interlocking processes are not being started in the correct order as if startpar was modified to ignore dependency info .. but I need to look at startpar to find out how this could happen... Right now: my fstab is missing a few entries due to having been restored from a backup, but other than some new partitions and all the snapshots (which don't get mounted from fstab, but via my snapper.pl script). # /etc/fstab - ishtar # local #1234567890123456789012345678901234567890123456789012345678901234567890123456789 # colnums/tsops 21 # what where type options bck ord proc /proc proc defaults 0 0 sysfs /sys sysfs defaults 0 0 debugfs /sys/kernel/debug debugfs defaults 0 0 devpts /dev/pts devpts mode=0620,gid=5 0 0 #hugetlbfs /dev/hugetlbfs hugetlbfs mode=1777,rw,nosuid,nodev,noexec,noatime,nodiratime,nofail 0 0 #LABEL=Swap swap swap defaults /dev/sdc5 swap swap defaults 0 0 /dev/Home+Space/Sys / xfs defaults,noatime,nodiratime,swalloc,largeio,logbsize=256k,allocsize=64k 0 0 #/dev/sdc6 /usr xfs defaults,noatime,nodiratime,swalloc,largeio,logbsize=256k,allocsize=64k 0 0 /dev/Home+Space/Sysvar /var xfs defaults,noatime,nodiratime,swalloc,largeio,logbsize=256k,allocsize=128k 0 0 /dev/Home+Space/Sysboot /boot xfs defaults,noatime,nodiratime,swalloc,largeio,allocsize=2m 0 0 /var/rtmp /tmp none rbind 0 0 /dev/Home+Space/Home /home xfs defaults,noatime,nodiratime,swalloc,largeio,logbsize=256k,inode64,nobarrier,allocsize=128k 0 0 /dev/Home+Space/Share /Share xfs defaults,noatime,nodiratime,swalloc,largeio,logbsize=256k,inode64,nobarrier,allocsize=1024k 0 0 /dev/Home+Space/Home.diff /home.diff xfs defaults,noatime,nodiratime,swalloc,largeio,logbsize=256k,nobarrier 0 0 /dev/Media/Media /Media xfs defaults,noatime,nodiratime,swalloc,largeio,logbsize=256k,allocsize=32m 0 0 /dev/Home+Space/Squid_Cache /var/cache/squid xfs defaults,noatime,nodiratime,swalloc,largeio,logbsize=256k,nobarrier,allocsize=64k 0 0 /dev/Backups/Backups /backups xfs defaults,noatime,nodiratime,swalloc,largeio,logbsize=256k,inode64,allocsize=128m 0 0 /dev/Home+Space/Media_Back /backups/Media xfs defaults,noatime,nodiratime,swalloc,largeio,logbsize=256k,allocsize=128m 0 0 /home/share /usr/share none rbind --------------------------- Ishtar:/# systemctl status lvm.service Failed to get D-Bus connection: No connection to service manager. Ishtar:/# ^C / Ishtar:/# ps -ef|grep dbus 207 3514 1 0 18:43 ? 00:00:00 /bin/dbus-daemon --system law 3748 1 0 17:52 ? 00:00:00 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session root 4625 4313 0 18:46 ttyp0 00:00:00 grep --color=auto dbus Ishtar:/# systemctl show lvm.service Failed to get D-Bus connection: No connection to service manager. Ishtar:/# systemctl status rc-local.service Failed to get D-Bus connection: No connection to service manager. ------------------------------ If I reboot right now, I may have problems getting back up. So far, I've had different things fail each boot -- usually the network not coming up in time for network services... This is a new symptom as of yesterday, so I haven't gotten a clue as to why it's happening. --- Most of these problems have come around in trying to upgrade my system to 12.2 for the sysadmin functions -- originally to support getting vm's to work with the vm-repo I was pointed at only being 12.2 compat. I.e. these problems are related to the increasing number incompatibilities between 'next-suse-version's, especially in regards to minor versions...(12.1->12.2->12.3 vs. 10.x=>11.x=>12.x)... I don't know what more information you want... does systemd need to be running in order to query it's state? If so, why would chkconfig say the boot scripts are overshadowed by systemd and try to contact it if it was not? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805420 https://bugzilla.novell.com/show_bug.cgi?id=805420#c7 Frederic Crozat <fcrozat@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |suse@tlinx.org --- Comment #7 from Frederic Crozat <fcrozat@suse.com> 2013-03-13 09:21:48 UTC --- (In reply to comment #2)
please reboot with "systemd.log_level=debug systemd.log_target=kmsg" on kernel command line and attach dmesg output on this bug report.
I still need that. And use "add an attachment" for the output. And yes, systemd must be running when I'm asking for all those informations. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805420 https://bugzilla.novell.com/show_bug.cgi?id=805420#c8 L. A. Walsh <suse@tlinx.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED CC| |suse@tlinx.org InfoProvider|suse@tlinx.org | Resolution| |WONTFIX --- Comment #8 from L. A. Walsh <suse@tlinx.org> 2013-04-01 13:20:30 PDT --- I'm not going to try to reproduce the setup from 12.1 as I'm trying to fix boot in factory to boot from HD ... work on 12.1 is a waste at this point, so I'm not going to spend any more time trying to analyze, reproduce or fix this issue. If I run into it again, I'll reopen it -- part of the problem may be that there seems to be a purposeful addition to not tell "telinit" that the runlevel needs to change and it needs to run its scripts, but that's a guess based on some source viewing... Any fixes I make need to address problems in factory to repair/undo/compensate for recent changes in the what used to be the standard suse boot procedure for the past 10+ years...*sigh*... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com