[Bug 995258] New: linux 4.7.0 cannot fork when out of main memory -- not using available swap
http://bugzilla.novell.com/show_bug.cgi?id=995258 Bug ID: 995258 Summary: linux 4.7.0 cannot fork when out of main memory -- not using available swap Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: rcoe@wi.rr.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- running tumbleweed latest with linux kernel 4.7.0 When available main memory is at the limit, fork's fail and other out-of-memory errors occur. System is not using the available swap partition. swapon -s Filename Type Size Used Priority /dev/sda2 partition 2206716 832 -1 Not sure why the system is not using the allocated swap partition. I've reproduced this twice. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c1
Takashi Iwai
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c2
--- Comment #2 from Michal Hocko
Michal, this was already addressed in the recent 4.7.x, right?
The fix is not in Linus tree yet thus not in stable but it is already in mmotm. You can try to test with http://lkml.kernel.org/r/20160823074339.GB23577@dhcp22.suse.cz -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c3
--- Comment #3 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c4
--- Comment #4 from Michal Hocko
Does this patch address issues when OOM killer is not invoked?
I suspect it won't help
In my use the user allocation request for memory is being denied and existing processes are not being killed.
Could you provide the full kernel log? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c5
--- Comment #5 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
Michal Hocko
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c6
--- Comment #6 from Michal Hocko
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c7
--- Comment #7 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c8
--- Comment #8 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c9
--- Comment #9 from Michal Hocko
Yeah I wish it was an overcommit. It's not using virtual memory at all. See the first enclosure with 2gb of swap space not being used?
swap space will not help you if a fork is to copy a large address space to the child. Try echo 1 > /proc/sys/vm/overcommit_memory to see whether it helps. It will disable the overcommit checks altogether. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c10
--- Comment #10 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c11
--- Comment #11 from Rich Coe
Try echo 1 > /proc/sys/vm/overcommit_memory to see whether it helps. Was that a default on 4.0 and 4.1 opensuse kernel? That would explain alot why it's different now.
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c12
--- Comment #12 from Michal Hocko
Looks like clone() is returning EAGAIN even though there is plenty of memory, thread slots, and process slots.
EAGAIN would be returned even when there are too many processes running. Does systemd throttle the number of processes with pid cgroup which causes this? (In reply to Rich Coe from comment #11)
Try echo 1 > /proc/sys/vm/overcommit_memory to see whether it helps. Was that a default on 4.0 and 4.1 opensuse kernel? That would explain alot why it's different now.
No, it wasn't I just wanted to rule out overcommit issues. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c13
--- Comment #13 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c14
--- Comment #14 from Michal Hocko
I didn't set up anything in systemd, cgroup or otherwise.
Well, systemd tends to do many things behind your back. So I would double check the pid controller configuration for the cgroup your process belongs to. /proc/<pid>/cgroup will list all the cgroups the given pid is attached to. If there is a pid controller then check the respective cgroup and its max pid count setup. We have seen issues like that in the past. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c15
--- Comment #15 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c16
--- Comment #16 from Michal Hocko
Here's the cgroup for one of my bash processes: cat /proc/4188/cgroup 11:pids:/user.slice/user-X.slice/session-1.scope 10:hugetlb:/ 9:memory:/ 8:devices:/user.slice 7:perf_event:/ 6:cpuset:/ 5:freezer:/ 4:blkio:/ 3:net_cls,net_prio:/ 2:cpu,cpuacct:/ 1:name=systemd:/user.slice/user-X.slice/session-1.scope
The pid limit is currently 4096 which seems reasonable. : cat /sys/fs/cgroup/pids/user.slice/user-X.slice/pids.current 469 : cat /sys/fs/cgroup/pids/user.slice/user-X.slice/pids.max 4096
I would double check when the fork is failing...
I still think it's weird that system swap isn't being used at all.
Why would system swap matter at all for this failure. This is not about lack of memory AFAICS. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c18
--- Comment #18 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c19
--- Comment #19 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c21
--- Comment #21 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c22
--- Comment #22 from Rich Coe
http://bugzilla.novell.com/show_bug.cgi?id=995258
http://bugzilla.novell.com/show_bug.cgi?id=995258#c24
--- Comment #24 from Rich Coe
participants (1)
-
bugzilla_noreply@novell.com