[Bug 757783] New: clone() with CLONE_NEWPID leaks kernel memory
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c0
Summary: clone() with CLONE_NEWPID leaks kernel memory
Classification: openSUSE
Product: openSUSE 12.1
Version: Final
Platform: x86-64
OS/Version: openSUSE 12.1
Status: NEW
Severity: Critical
Priority: P5 - None
Component: Kernel
AssignedTo: kernel-maintainers@forge.provo.novell.com
ReportedBy: ccrssaa@karelia.ru
QAContact: qa-bugs@suse.de
Found By: ---
Blocker: ---
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120312
Firefox/11.0 SeaMonkey/2.8
Server with vsftpd running started to die in agony with kswapd eating 100% cpu
after upgrading to openSUSE 12.1. Turned out that vsftpd isolates each process
using CLONE_NEWPID and 3.1.9-1.4 kernel does not free pid_namespace slabs.
Reproducible: Always
Steps to Reproduce:
1.
test.c:
#include
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c1
--- Comment #1 from Vadim Ponomarev
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c2
--- Comment #2 from Vadim Ponomarev
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c3
Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c4
Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c5
--- Comment #5 from Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c6
--- Comment #6 from Vadim Ponomarev
This might be the mainline fix ...
- dentry->d_op = &pid_dentry_operations; + d_set_d_op(dentry, &pid_dentry_operations);
does not solve the problem with pid_namespace (3.1.9-1.4-desktop) net_namespace becomes zero in a short time after clone loop (the same behaviour without patch), pid_namespace leaks -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c7
--- Comment #7 from Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c8
--- Comment #8 from Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c9
--- Comment #9 from Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c10
--- Comment #10 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c11
--- Comment #11 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c12
Jeff Mahoney
cat /proc/slabinfo | grep namespa
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c13
--- Comment #13 from Vadim Ponomarev
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c14
--- Comment #14 from Vadim Ponomarev
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c15
--- Comment #15 from Vadim Ponomarev
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c16
--- Comment #16 from Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c17
--- Comment #17 from Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c18
Vadim Ponomarev
does it go away after a while? no (~1.5 hours since 10:46:41 UTC)
or does it stay at this level? yes
does it increase if you call the reproducer several times? yes
if it does not go down or increases, please reopen ok
looks like a two different bugs IMHO first was "does not release pid_namespace slabs at all" (fixed in 905ad269c55fc62bee3da29f7b1d1efeba8aa1e1) second is "leaks some", exists in oS 12.1 with 905ad269c55fc62bee3da29f7b1d1efeba8aa1e1 patch applied, exists in oS 11.4 too, and somehow related to the presence of SIGCHLD handler with waitpid() Is it possible to fork a new bug from this one ? Or should I report new ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c19
--- Comment #19 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c20
--- Comment #20 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c21
--- Comment #21 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c22
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c23
--- Comment #23 from Vadim Ponomarev
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c24
Vadim Ponomarev
Test fix confirmed. mnt_cache drops back to baseline with that patch applied.
I've applied it to 12.1 and SP2.
I've also verified that this is the last of the kern_mount_data leaks.
Vadim, can you confirm that with both patches applied the pid_namespace slab drops back to baseline (usually 0) within 30 seconds?
3.1.10-1.9-desktop with 905ad269c55fc62bee3da29f7b1d1efeba8aa1e1 and 6f686574cccc2ef66fb38e41f19cedd81e7b4504 applied test program is vsftpd-1.c from the first attachment https://bugzilla.novell.com/attachment.cgi?id=487333 note that (unlike test.c in initial report) vsftpd-1.c has SIGCHLD handler as an attempt to simulate real vsftpd behaviour watch.pl is the second attachment https://bugzilla.novell.com/attachment.cgi?id=487338 - just for convenience 1) test program compiled withoud SIGCHLD handler (gcc -DWITH_SIGCHLD=0 vsftpd-1.c) - everything is freed
./watch.pl Mon Apr 23 00:54:09 2012 pid_namespace=0 mnt_cache=39 Mon Apr 23 00:54:56 2012 pid_namespace=24 mnt_cache=75 Mon Apr 23 00:54:57 2012 pid_namespace=57 mnt_cache=120 Mon Apr 23 00:54:58 2012 pid_namespace=93 mnt_cache=150 Mon Apr 23 00:54:59 2012 pid_namespace=102 mnt_cache=165 Mon Apr 23 00:55:01 2012 pid_namespace=82 mnt_cache=165 Mon Apr 23 00:55:03 2012 pid_namespace=72 mnt_cache=117 Mon Apr 23 00:55:05 2012 pid_namespace=42 mnt_cache=69 Mon Apr 23 00:55:07 2012 pid_namespace=32 mnt_cache=54 Mon Apr 23 00:55:09 2012 pid_namespace=5 mnt_cache=46 Mon Apr 23 00:55:11 2012 pid_namespace=2 mnt_cache=42 Mon Apr 23 00:55:13 2012 pid_namespace=1 mnt_cache=40 Mon Apr 23 00:55:15 2012 pid_namespace=0 mnt_cache=39
2) test program compiled with SIGCHLD handler (gcc vsftpd-1.c) - pid_namespace and mnt_cache leaked
./watch.pl (first run, from another terminal) Mon Apr 23 01:02:52 2012 pid_namespace=0 mnt_cache=39 Mon Apr 23 01:03:01 2012 pid_namespace=21 mnt_cache=75 Mon Apr 23 01:03:02 2012 pid_namespace=27 mnt_cache=90 Mon Apr 23 01:03:03 2012 pid_namespace=33 mnt_cache=105 Mon Apr 23 01:03:07 2012 pid_namespace=23 mnt_cache=57 Mon Apr 23 01:03:09 2012 pid_namespace=13 mnt_cache=53 Mon Apr 23 01:03:11 2012 pid_namespace=11 mnt_cache=51 Mon Apr 23 01:03:13 2012 pid_namespace=10 mnt_cache=50 Mon Apr 23 01:03:15 2012 pid_namespace=10 mnt_cache=49 (10 out of 100 slabs leaked in first run) (run test program from another terminal again) Mon Apr 23 01:10:49 2012 pid_namespace=22 mnt_cache=105 Mon Apr 23 01:10:50 2012 pid_namespace=27 mnt_cache=105 Mon Apr 23 01:10:51 2012 pid_namespace=36 mnt_cache=120 Mon Apr 23 01:10:52 2012 pid_namespace=39 mnt_cache=120 Mon Apr 23 01:10:55 2012 pid_namespace=29 mnt_cache=91 Mon Apr 23 01:10:57 2012 pid_namespace=22 mnt_cache=64 Mon Apr 23 01:10:59 2012 pid_namespace=21 mnt_cache=62 Mon Apr 23 01:11:01 2012 pid_namespace=21 mnt_cache=61 Mon Apr 23 01:11:02 2012 pid_namespace=21 mnt_cache=60 (11 out of 100 slabs leaked in second run)
pid_namespace stays 21 and mnt_cache stays 60 forever instead of shrinking back to initial 0 and 39 please look at the comment 18, this is a different bug IMHO (race with signal code ?)
My testing used the SIGCHILD clone flag and was successful.
Please check the case when not only SIGCHLD clone flag is set, but the handler is enabled using sigaction() -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c25
--- Comment #25 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c26
--- Comment #26 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c27
--- Comment #27 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c30
--- Comment #30 from Vadim Ponomarev
Ok. I can reproduce this but not on the scale you're seeing. I see exactly one pid ns leaked for each run. It doesn't leak w/o SIGCHLD.
reproduced that "exactly one pid ns for each run" on 3.1.10-1.9-default results from comment 24 were obtained on -desktop seems that cpu speed doesn't matter (results from i3 550 and c2d 6420 are looking pretty similar, ~10-11 leaked pid ns per test run), only -desktop/-default makes the difference -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c31
--- Comment #31 from Mike Galbraith
(In reply to comment #25)
Ok. I can reproduce this but not on the scale you're seeing. I see exactly one pid ns leaked for each run. It doesn't leak w/o SIGCHLD.
reproduced that "exactly one pid ns for each run" on 3.1.10-1.9-default
results from comment 24 were obtained on -desktop
seems that cpu speed doesn't matter (results from i3 550 and c2d 6420 are looking pretty similar, ~10-11 leaked pid ns per test run), only -desktop/-default makes the difference
Mainline with voluntary preempt leaks heavily here. It does not leak at all if you ensure that the parent exits before children, so reparent is innocent. Ensure parent stays around, it leaks madly. user/net_namespaces do not leak. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c32
--- Comment #32 from Mike Galbraith
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c33
--- Comment #33 from Vadim Ponomarev
Created an attachment (id=488769) --> (http://bugzilla.novell.com/attachment.cgi?id=488769) [details] leak fix
After finally convincing ftrace to capture the _whole_ event, turns out one leak is simple, SIGCHLD received during fork() triggers fork() failure - proc was mounted but not unmounted on cleanup.
There's at least one more (not so simple) leak though. The final put_pid() in softirq context occasionally just goes missing for some as yet unknown reason.
tried 3.1.10-1.9-desktop, -default and -xen (as a dom0 and domU kernel) with all three patches applied, vsftp and "netcat -z 127.0.0.1 21" loop no leak with -desktop and -xen dom0 pid_ns leaks with -default and -xen domU
Seems there are some nasty issues open in pid namespace as well, Oleg sent me this link:
Heh. Please look at http://marc.info/?l=linux-kernel&m=127687751003902 and the whole thread, there are a lot more problems here.
sad btw wonder why nobody reported this issue long time ago no one is using vsftpd on oS nowadays ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c34
--- Comment #34 from Mike Galbraith
pid_ns leaks with -default and -xen domU
Likely this one. vsftpd-14507 [003] .... 1467.046189: proc_set_super: get_pid_ns: 0xffff8801dc560998 count:1->2 vsftpd-14507 [003] .... 1467.046201: create_pid_namespace: create_pid_namespace: 0xffff8801dc560998 vsftpd-14507 [003] .... 1467.046206: alloc_pid: get_pid_ns: 0xffff8801dc560998 count:2->3 vsftpd-14521 [003] .... 1467.052481: switch_task_namespaces: exiting: 0xffff8801dc560998 count:3 vsftpd-14521 [003] .... 1467.073823: free_nsproxy: put_pid_ns: 0xffff8801dc560998 count:3->2 vsftpd-14507 [003] .... 1467.173657: put_pid: namespace: 0xffff8801dc560998 pid count:2->1 pid_ns count:2 vsftpd-14507 [003] .... 1467.173677: proc_kill_sb: put_pid_ns: 0xffff8801dc560998 count:2->1 <idle>-0 [003] ..s. 1467.213562: put_pid: namespace: 0xffff8801dc560998 pid count:6->5 pid_ns count:1 While we wait for rcu destruction, someone grabs references to the pid, foiling grand destruction plan.. sometimes, like this one, plan is foiled permanently.
btw wonder why nobody reported this issue long time ago no one is using vsftpd on oS nowadays ?
Or folks have truckloads of ram, and don't notice a bit going missing. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c35
--- Comment #35 from Mike Galbraith
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c36
--- Comment #36 from Mike Galbraith
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c37
--- Comment #37 from Mike Galbraith
Created an attachment (id=489513) --> (http://bugzilla.novell.com/attachment.cgi?id=489513) [details] trace etc
BTW, the "leak" does happen without SIGCHLD, is merely MUCH less likely. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c38
--- Comment #38 from Mike Galbraith
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c39
--- Comment #39 from Mike Galbraith
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c42
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c43
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c44
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c45
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c46
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c47
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c48
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c49
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c50
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c51
--- Comment #51 from Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c52
--- Comment #52 from Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c53
--- Comment #53 from Marcus Meissner
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c54
--- Comment #54 from Mike Galbraith
Michael, regarding your last comment ... any news?
I think it's all done upstream. I'll have to look to see what the final outcome was. Fires are burning though... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c55
--- Comment #55 from Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=757783
https://bugzilla.novell.com/show_bug.cgi?id=757783#c56
Marcus Meissner
http://bugzilla.novell.com/show_bug.cgi?id=757783
SMASH SMASH
participants (1)
-
bugzilla_noreply@novell.com