[Bug 965564] New: make[2]: fork: Resource temporarily unavailable
http://bugzilla.opensuse.org/show_bug.cgi?id=965564 Bug ID: 965564 Summary: make[2]: fork: Resource temporarily unavailable Classification: openSUSE Product: openSUSE.org Version: unspecified Hardware: Other OS: Other Status: NEW Severity: Major Priority: P5 - None Component: BuildService Assignee: bnc-team-screening@forge.provo.novell.com Reporter: dimstar@opensuse.org QA Contact: adrian@suse.com Found By: --- Blocker: --- Since the introduction of 'lamb' there is a high number of builds 'randomly' failing with the error message: make[2]: fork: Resource temporarily unavailable I have seen this on packages with as low memory usage as 600MB in their previous builds but also on larger packages... The latest one seen this morning: https://build.opensuse.org/package/live_build_log/openSUSE:Factory:Staging:J... Build was running on: 2016-02-05 20:06:01 CET kconfig meta change unchanged 4m 7s cloud113:4 2016-02-08 01:49:48 CET kconfig meta change failed 1m 15s lamb09:5 2016-02-08 08:20:31 CET kconfig new build failed 1m 15s lamb06:5 A retrigger, paired with some luck, allows the build to pass - but is of course frustrating. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c1
Adrian Schröter
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c2
--- Comment #2 from Ruediger Oertel
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c3
Aleksa Sarai
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c4
--- Comment #4 from Dominique Leuenberger
can we print an "ulimit -a" at the start of the build section somehow ? it sounds like something might overwrite the number of max processes to a low value and too many processes are still running inside the VM.
are we installing/configuring /etc/security/limits.conf inside the distro maybe ?
I created a local (local to the staging project) modification on openSUSE:Factory:Staging:C:DVD/kdesignerplugin this package had the error before (after a short 2 minute build); can we somehow force this to end up on lamb? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c5
--- Comment #5 from Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c6
--- Comment #6 from Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c7
--- Comment #7 from Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c8
Adrian Schröter
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c9
Ludwig Nussel
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c10
--- Comment #10 from Ludwig Nussel
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
Ludwig Nussel
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c11
--- Comment #11 from Aleksa Sarai
systemd sets up /sys/fs/cgroup/pids/init.scope specifically for pid 1. So it could be argued that the bug is that systemd doesn't fully clean up after itself when switching to the real root, assuming pid 1 there is also systemd.
I'm not sure if you'll be able convince the systemd guys that this is a bug. However, here's a simple workaround to stick in the start of the build scripts: % echo $$ > /sys/fs/cgroups/pids/cgroup.procs Since the root cgroup doesn't allow for pids limits, attaching to the root cgroup should solve your problems. We could also increase the ulimits here if appropriate. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c12
--- Comment #12 from Jan Engelhardt
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c13
Thomas Blume
The upstream discussion about that feature is here: https://lists.freedesktop.org/archives/systemd-devel/2015-November/035006. html
Commits: https://github.com/systemd/systemd/commit/ 0af20ea2ee2af2bcf2258e7a8e1a13181a6a75d6 https://github.com/systemd/systemd/commit/ 9ded9cd14cc03c67291b10a5c42ce5094ba0912f
systemd 228 has these defaults (/etc/systemd/system.conf): #DefaultTasksMax=512 I guess this is the bottleneck. Can't you just set it to a higher value for the build machines? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c14
--- Comment #14 from Adrian Schröter
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c15
--- Comment #15 from Ludwig Nussel
systemd 228 has these defaults (/etc/systemd/system.conf):
#DefaultTasksMax=512
I guess this is the bottleneck. Can't you just set it to a higher value for the build machines?
Still feels like a workaround. Shouldn't systemd clean up after itself and undo the changes it did to cgroups? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c17
--- Comment #17 from Thomas Blume
(In reply to Thomas Blume from comment #13)
systemd 228 has these defaults (/etc/systemd/system.conf):
#DefaultTasksMax=512
I guess this is the bottleneck. Can't you just set it to a higher value for the build machines?
Still feels like a workaround. Shouldn't systemd clean up after itself and undo the changes it did to cgroups?
Hm, systemd only resets RLIMIT_NOFILE at reexecute: -->-- /* Reset the RLIMIT_NOFILE to the kernel default, so * that the new systemd can pass the kernel default to * its child processes */ if (saved_rlimit_nofile.rlim_cur > 0) (void) setrlimit(RLIMIT_NOFILE, &saved_rlimit_nofile); --<-- Maybe it should also reset RLIMIT_NPROC? But I'm unsure wheter this would have an effect on the reported behaviour, unless: DefaultTasksAccounting=no is set in system.conf. See the systemd.resource-control manpage for details. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c18
--- Comment #18 from Dominique Leuenberger
Maybe it should also reset RLIMIT_NPROC?
But I'm unsure wheter this would have an effect on the reported behaviour, unless:
Probably it won't as the qemu / build VM is started with: init=/.build/build (so systemd is not even re-executed) So in this case any remainings of systemd's limits is just confusing, as systemd is not pid1 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c19
--- Comment #19 from Thomas Blume
(In reply to Thomas Blume from comment #17)
Maybe it should also reset RLIMIT_NPROC?
But I'm unsure wheter this would have an effect on the reported behaviour, unless:
Probably it won't as the qemu / build VM is started with:
init=/.build/build (so systemd is not even re-executed)
So in this case any remainings of systemd's limits is just confusing, as systemd is not pid1
dracut in hostonly mode copies /etc/systemd/system.conf into the initrd. So a solution would be to provide an adapted system.conf before the initrd is built. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c20
--- Comment #20 from Thomas Blume
dracut in hostonly mode copies /etc/systemd/system.conf into the initrd. So a solution would be to provide an adapted system.conf before the initrd is built.
To be more precise, this is rather a workaround. I agree that system should do a proper cleanup when it gets shut down. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c21
--- Comment #21 from Dr. Werner Fink
So in this case any remainings of systemd's limits is just confusing, as systemd is not pid1
The design of systemd and any other init program is that it has pid 1. Otherwise it can not wipe out any zombi of a died daemon process. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c22
--- Comment #22 from Dominique Leuenberger
(In reply to Dominique Leuenberger from comment #18)
So in this case any remainings of systemd's limits is just confusing, as systemd is not pid1
The design of systemd and any other init program is that it has pid 1. Otherwise it can not wipe out any zombi of a died daemon process.
right.. and the init program in the build bot is called /.build/build - NOT systemd. Systemd just wrongly survives as being spawned out of initrd already and setting up limits which are not on the actual system. THAT's the issue and that's what we claim systemd should cleanup
From withing a OBS worker: [ 176s] 1 ttyS0 Ss+ 0:01 /bin/bash /.build/build
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c23
Franck Bui
(In reply to Dr. Werner Fink from comment #21)
(In reply to Dominique Leuenberger from comment #18)
So in this case any remainings of systemd's limits is just confusing, as systemd is not pid1
The design of systemd and any other init program is that it has pid 1. Otherwise it can not wipe out any zombi of a died daemon process.
right.. and the init program in the build bot is called /.build/build - NOT systemd. Systemd just wrongly survives as being spawned out of initrd already and setting up limits which are not on the actual system. THAT's the issue and that's what we claim systemd should cleanup
Then don't use systemd at all. PID1 is not supposed to be started and replaced by another init system later. Also I would suggest to teach your init system to do some basic initialisations when starting instead of totally relying on an undefined state. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c24
--- Comment #24 from Dr. Werner Fink
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c25
--- Comment #25 from Ludwig Nussel
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c26
--- Comment #26 from Ludwig Nussel
Systemd just wrongly survives as being spawned out of initrd
I'm not sure that's accurate, what we know is that the cgroups survive. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c27
--- Comment #27 from Dr. Werner Fink
From NEWS of current git repository:
* There's a new system.conf setting DefaultTasksMax= to control the default TasksMax= setting for services and scopes running on the system. (TasksMax= is the primary setting that exposes the "pids" cgroup controller on systemd and was introduced in the previous systemd release.) The setting now defaults to 512, which means services that are not explicitly configured otherwise will only be able to create 512 processes or threads at maximum, from this version on. Note that this means that thread- or process-heavy services might need to be reconfigured to set TasksMax= to a higher value. It is sufficient to set TasksMax= in these specific unit files to a higher value, or even "infinity". Similar, there's now a logind.conf setting UserTasksMax= that defaults to 4096 and limits the total number of processes or tasks each user may own concurrently. nspawn containers also have the TasksMax= value set by default now, to 8192. Note that all of this only has an effect if the "pids" cgroup controller is enabled in the kernel. The general benefit of these changes should be a more robust and safer system, that provides a certain amount of per-service fork() bomb protection. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c30
--- Comment #30 from Dr. Werner Fink
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
http://bugzilla.opensuse.org/show_bug.cgi?id=965564#c31
--- Comment #31 from Adrian Schröter
http://bugzilla.opensuse.org/show_bug.cgi?id=965564
Stefan Fent
participants (1)
-
bugzilla_noreply@novell.com