[Bug 1003342] New: New generator nfs-system-generator let system not boot
http://bugzilla.suse.com/show_bug.cgi?id=1003342 Bug ID: 1003342 Summary: New generator nfs-system-generator let system not boot Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.1 Hardware: All OS: openSUSE 42.1 Status: NEW Severity: Critical Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: werner@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- The system hangs during boot. After removing /usr/lib/systemd/system-generators/nfs-server-generator below mounted and chrooted system partition in a rescue system the system is able to boot agaoin. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1003342
Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c1
--- Comment #1 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c2
Neil Brown
What exactly should the nfs-server-generator do?
If you give it three directories (they can all be /tmp) it will create a directory "nfs-server.service.d" in the first directory, and a file "order-with-mounts.conf" in that directory. This serves as a "drop-in" for systemd to extend the nfs-server.service unit file. It will read /etc/fstab and /etc/exports (and /etc/exports.d/*) and write out two sorts of directives: 1/ "RequiresMountsFor=DIR" for every directory that is exported by /etc/exports 2/ "Before=DIR.mount" for every nfs (or nfs4) mount point listed in /etc/fstab. This ensures that nfsd is started after all the filesystems it exports are mounted, and before and NFS filesystems are mounted. Presumably the hang that you notice is caused by one of those causing systemd to wait for something that will never happen. Could you please run /usr/lib/systemd/system-generators/nfs-server-generator /tmp /tmp /tmp and then attach /etc/fstab /etc/exports and /tmp/nfs-server.service.d/order-with-mounts.conf to this bug. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c3
--- Comment #3 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c4
--- Comment #4 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c5
--- Comment #5 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c6
--- Comment #6 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c7
--- Comment #7 from Neil Brown
Why this strange path ... IMHO it should be
/run/systemd/system/nfs-server.service.d/order-with-mounts.conf
True, and it would be if you had passed "/run/systemd/system" as the argument to nfs-server-generator. I think systemd passes /run/systemd/generator. /tmp was just for testing. There is nothing unusual in your files. The only effect of having nfs-server-generator present is that nfs-server.service won't start until /usr/src is mounted, and that would have been the case anyway. So it really should have no net effect. How certain are you that this program is the cause of the failure to boot? Did you try putting the program back and observe the failure return? Presumably the boot is getting past the initrd/dracut stage? Can you get logs of what systemd thought was happening (no, i don't know off hand how to do that). Very confusing... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c8
Thomas Blume
How certain are you that this program is the cause of the failure to boot? Did you try putting the program back and observe the failure return? Presumably the boot is getting past the initrd/dracut stage? Can you get logs of what systemd thought was happening (no, i don't know off hand how to do that).
Very confusing...
I discovered the same issue on my machine. First, I've blamed it on my fancy sytem root setup (lvm on top of softraid spanning the local disk and a multipathed iscsi disk). But this is completely different from Werners machine that has only a local disk. Actually the only common thing is that we are both using nfs server on Leap 42.1 and that deleting /usr/lib/systemd/system-generators/nfs-server-generator made the system boot up. One more question would be why this generator is in the nfs-client package, whereas it obviously is something for the nfs server. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c9
--- Comment #9 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c10
Neil Brown
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c11
--- Comment #11 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c12
--- Comment #12 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c13
--- Comment #13 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c16
--- Comment #16 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c17
--- Comment #17 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c18
--- Comment #18 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c19
--- Comment #19 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c20
--- Comment #20 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c21
--- Comment #21 from Dr. Werner Fink
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c22
Neil Brown
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c23
--- Comment #23 from Neil Brown
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c24
--- Comment #24 from Dr. Werner Fink
I think you need xlog_syslog(0); rather than xlog_stderr(1);
Hmmm ... OK, it should xlog_syslog(0) as well as xlog_stderr(1) as otherwise we never see any error. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c25
--- Comment #25 from Thomas Schäfer
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c26
--- Comment #26 from Neil Brown
OK, it should xlog_syslog(0) as well as xlog_stderr(1) as otherwise we never see any error.
I don't think that is true. support/nfs/xlog.c contains static int log_stderr = 1; static int log_syslog = 1; so logging to both stderr and syslog are enabled by default (though some errors only go to syslog if stderr is not enabled. FATAL/ERROR/WARNING/NOTICE will go to both). So we just need to disable syslog, not enable stderr. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c27
--- Comment #27 from Neil Brown
He would change the nfs-server unit. Instead of network.target network-online.target.
I disagree. We really do want nfsd to start even if the network isn't online yet. There could be a problem because nfs-server.service runs exportfs, and that will fail if DNS lookup isn't working. I think the correct fix for that would be to change exportfs to work correctly when DNS is not available - I don't think there is a reason why it shouldn't work.
One thing I still do not understand - on my system nfs-server.service was disabled at that time.
The problem is being caused by a generator. systemd generators are run unconditionally so the nfs-server genetator is run whether nfs-server.service is enabled or not (so changing nfs-server to start a bit later wouldn't stop the generator from misbehaving). I'm confident that we have found the correct resolution in changing the generator not to use DNS or syslog. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c28
--- Comment #28 from Thomas Blume
One thing I still do not understand - on my system nfs-server.service was disabled at that time.
The problem is being caused by a generator. systemd generators are run unconditionally so the nfs-server genetator is run whether nfs-server.service is enabled or not (so changing nfs-server to start a bit later wouldn't stop the generator from misbehaving).
I'm wondering whether we should introduce a default timeout for generators as a safety measure. It's not really safe if an arbitrary generator can hang the system boot. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c29
Neil Brown
I'm wondering whether we should introduce a default timeout for generators as a safety measure.
Possibly a good idea. Something to take up with the systemd developers. I have created an nfs-utils package which fixes these issues. It is in home:neilbrown:branches:SUSE:SLE-12-SP1:Update/nfs-utils and I have submitted a maintenance request for SLE-12-SP1. The same update should go to Leap:42.1 Maintenance: the fix for bsc#994468 can cause the system to hang on boot, depending on particular details of configuration. So the next nfs-utils update might need to be a slightly higher priority. I need to submit an update for 12-SP2 as well. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c31
Leonardo Chiquitto
Maintenance: the fix for bsc#994468 can cause the system to hang on boot, depending on particular details of configuration. So the next nfs-utils update might need to be a slightly higher priority.
Thanks for the summary. Will be submitted to QAM next week. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1003342
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1003342
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1003342
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c32
--- Comment #32 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1003342
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c33
Neil Brown
http://bugzilla.suse.com/show_bug.cgi?id=1003342
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c34
--- Comment #34 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c35
--- Comment #35 from Thomas Schäfer
http://bugzilla.suse.com/show_bug.cgi?id=1003342
http://bugzilla.suse.com/show_bug.cgi?id=1003342#c36
Neil Brown
participants (1)
-
bugzilla_noreply@novell.com