[Bug 1165351] New: systemd libpam-systemd immediate memory exhaustion triggers oom upon ssh
http://bugzilla.suse.com/show_bug.cgi?id=1165351 Bug ID: 1165351 Summary: systemd libpam-systemd immediate memory exhaustion triggers oom upon ssh Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Basesystem Assignee: screening-team-bugs@suse.de Reporter: luis.chamberlain@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- I have a fresh install of OpenSUSE Tumbleweed on two HP ProLiant DL380 Gen9, its at version 20200222 right now both with 30 GiB of RAM using systemd-244-2.1.x86_64. On one of them I noticed that whenever I ssh in as a regular user but not root I see the out of memory killer triggered. That is.. 30 GiB of RAM eaten up. The system now has the debug boolean set on /etc/pam.d/common-session and sudo journalctl -f reveals: Feb 29 16:31:18 chivo sshd[16511]: Accepted publickey for mcgrof from 10.100.208.217 port 44016 ssh2: RSA SHA256:tOilLVI1jM/Pvg5uJVQbeXGEAU0mma6Jxma9c5eVX0w Feb 29 16:31:18 chivo sshd[16511]: pam_systemd(sshd:session): pam-systemd initializing Feb 29 16:31:18 chivo sshd[16511]: pam_systemd(sshd:session): Asking logind to create session: uid=1000 pid=16511 service=sshd type=tty class=user desktop= seat= vtnr=0 tty= display= remote=yes remote_user= remote_host=10.100.208.217 Feb 29 16:31:18 chivo sshd[16511]: pam_systemd(sshd:session): Session limits: memory_max=n/a tasks_max=n/a cpu_weight=n/a io_weight=n/a runtime_max_sec=n/a Feb 29 16:31:18 chivo systemd-logind[2184]: New session 62 of user mcgrof. Feb 29 16:31:18 chivo systemd[1]: Starting User Manager for UID 1000... -- <oom trace> --- Feb 29 16:31:34 chivo kernel: oom_reaper: reaped process 16513 ((systemd)), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Feb 29 16:31:34 chivo systemd[1]: user@1000.service: Failed with result 'protocol'. Feb 29 16:31:34 chivo systemd[1]: Failed to start User Manager for UID 1000. Feb 29 16:31:34 chivo systemd[1]: Started Session 62 of user mcgrof. Feb 29 16:31:34 chivo sshd[16511]: pam_systemd(sshd:session): Reply from logind: id=62 object_path=/org/freedesktop/login1/session/_362 runtime_path=/run/user/1000 session_fd=7 seat= vtnr=0 original_uid=1000 Feb 29 16:31:34 chivo sshd[16511]: pam_unix(sshd:session): session opened for user mcgrof(uid=1000) by (uid=0) Here's what a vmstat log looks like every second, first with the system on idle, then I create a new ssh session, it takes a while to eat up all the memory and then the oom takes over and kills the pam systemd service, and I am able to log in. The blank links are put in place below to highlight when the ssh session is just delaying and the oom is about to hit: mcgrof@chivo ~ $ sudo vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 248704 31981364 136 160708 2 248 35 252 31 63 0 1 99 0 0 0 0 248704 31981056 136 160856 0 0 0 0 131 101 0 0 100 0 0 0 0 248704 31981056 136 160856 0 0 0 0 82 92 0 0 100 0 0 0 0 248704 31980552 136 160824 52 0 384 0 108 157 0 0 100 0 0 0 0 248704 31980472 136 161196 0 0 0 232 164 280 0 0 100 0 0 0 0 248704 31980200 136 161344 128 0 228 0 155 240 0 0 100 0 0 0 0 248704 31980200 136 161412 0 0 0 0 160 106 0 0 100 0 0 0 0 248704 31979628 136 161484 0 0 0 0 95 108 0 0 100 0 0 0 0 248704 31979628 136 161484 0 0 0 0 114 146 0 0 100 0 0 0 0 248704 31979628 136 161484 0 0 0 0 79 84 0 0 100 0 0 1 0 248704 31932744 136 161484 0 0 16 0 120 133 0 0 100 0 0 0 0 248704 31929664 136 161504 0 0 0 0 116 104 0 0 100 0 0 0 0 248704 31929640 136 161508 0 0 0 0 125 162 0 0 100 0 0 1 0 248704 28031352 136 163092 416 0 1940 0 574 515 3 2 95 0 0 1 0 248704 22681916 136 163092 0 0 64 340 473 354 3 2 95 0 0 1 0 248704 17326052 136 163172 0 0 0 0 364 116 3 2 95 0 0 1 0 248704 11968392 136 163172 0 0 0 0 364 124 3 2 95 0 0 1 0 248704 6612528 136 163172 0 0 0 0 374 110 3 2 95 0 0 1 0 248704 1257168 136 163172 0 0 0 0 366 114 3 2 95 0 0 1 1 565632 215044 136 82292 192 316912 1864 316912 1678 1135 1 2 91 6 0 2 1 954816 207848 136 76672 264 389708 1564 389708 30002 12569 0 1 89 9 0 3 2 353004 24513544 136 78252 140 139752 8480 139752 38917 113910 0 16 83 1 0 0 0 350444 32073372 136 122312 3040 24 42184 68 4774 14383 0 1 98 1 0 0 0 350444 32073372 136 122344 8 0 24 0 134 159 0 0 100 0 0 Running $(sudo htop --sort-key PERCENT_MEM -d 1) reveals the same but in a much more fun way. FWIW it is using yp, and I so I have: /etc/yp.conf domain suse.de server <some-private-ip> domain suse.de server <some-other-ip> However leaving this blank, running $(netconfig update -f), and then ssh'ing in reveals it still triggers. Related public issues: https://github.com/systemd/systemd/issues/11502 https://bugzilla.redhat.com/show_bug.cgi?id=1666690 https://github.com/systemd/systemd/commit/084eeb865ca63887098e0945fb4e93c852... Red Hat's bug reveals it was a bad backport of the fix for CVE-2018-16864 and so that itself introduced another CVE, CVE-2019-3815. I can reproduce very thing with a simple shiny new user created as user named test. The user root does not trigger the issue. What is however odd is that this issue is not reproducible on the other twin system. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c2 Anthony Iliopoulos <ailiopoulos@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ailiopoulos@suse.com --- Comment #2 from Anthony Iliopoulos <ailiopoulos@suse.com> --- see also: https://github.com/systemd/systemd/issues/9656 (this appears to be the same issue, but there's no conclusion) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 Luis Chamberlain <luis.chamberlain@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fbui@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c3 Luis Chamberlain <luis.chamberlain@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |meissner@suse.com --- Comment #3 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Luis Chamberlain from comment #0)
Related public issues:
https://github.com/systemd/systemd/issues/11502 https://bugzilla.redhat.com/show_bug.cgi?id=1666690 https://github.com/systemd/systemd/commit/ 084eeb865ca63887098e0945fb4e93c852b91b0f
Red Hat's bug reveals it was a bad backport of the fix for CVE-2018-16864 and so that itself introduced another CVE, CVE-2019-3815.
Our fix for CVE-2018-16864 was addressed via bsc#1120323 so Cc'ing Franck Bui. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c4 --- Comment #4 from Luis Chamberlain <luis.chamberlain@suse.com> --- Created attachment 831637 --> http://bugzilla.suse.com/attachment.cgi?id=831637&action=edit Image while system is idle Image while system is idle -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c5 --- Comment #5 from Luis Chamberlain <luis.chamberlain@suse.com> --- Created attachment 831638 --> http://bugzilla.suse.com/attachment.cgi?id=831638&action=edit systemd eating memory for breakfast Here is an image which reveals how systemd eats up memory, 30 GiB worth, in no time. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c6 --- Comment #6 from Luis Chamberlain <luis.chamberlain@suse.com> --- If one runs htop as root with: sudo htop --sort-key PERCENT_MEM -d 1 one can easily try to trigger strace on the (system) process which gets called which eats up memory by hitting "s". I have tried that but come up empty. Similarly running strace on systemd-journald doesn't tell at least me anything quite revealing or stupid it may be doing. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 Marcus Meissner <meissner@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|screening-team-bugs@suse.de |systemd-maintainers@suse.de -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c7 --- Comment #7 from Franck Bui <fbui@suse.com> --- After looking at the affected system, it seems to be an issue related to NSS and more specifically to the use of NIS. Here is the backtrace of the process (child of PID1) just before it gets killed by OOM killer:
#0 0x00007f9545019217 in pthread_cond_init@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f95439c4f9e in clnt_dg_create () from /lib64/libtirpc.so.3 #2 0x00007f95439c527a in clnt_tli_create () from /lib64/libtirpc.so.3 #3 0x00007f95439c5641 in getclnthandle () from /lib64/libtirpc.so.3 #4 0x00007f95439c640a in __rpcb_findaddr_timed () from /lib64/libtirpc.so.3 #5 0x00007f95439c7c11 in clnt_tp_create_timed () from /lib64/libtirpc.so.3 #6 0x00007f95439c7dc9 in clnt_create_timed () from /lib64/libtirpc.so.3 #7 0x00007f95439e5d8b in yp_bind_client_create_v3 () from /usr/lib64/libnsl.so.2 #8 0x00007f95439e61db in __yp_bind.part.0 () from /usr/lib64/libnsl.so.2 #9 0x00007f95439e7160 in yp_all () from /usr/lib64/libnsl.so.2 #10 0x00007f95439f8121 in _nss_nis_initgroups_dyn () from /lib64/libnss_nis.so.2 #11 0x00007f9543a0551b in getgrent_next_nss () from /lib64/libnss_compat.so.2 #12 0x00007f9543a05a45 in _nss_compat_initgroups_dyn () from /lib64/libnss_compat.so.2 #13 0x00007f9545669876 in internal_getgrouplist () from /lib64/libc.so.6 #14 0x00007f9545669bbb in initgroups () from /lib64/libc.so.6 #15 0x000055811e7e4097 in get_supplementary_groups (group=<optimized out>, ngids=<synthetic pointer>, supplementary_gids=<synthetic pointer>, gid=100, user=0x55811fd9c7b0 "mcgrof", c=0x55811fd34ab8)
It seems that something went wrong when the child of PID1 called initgroups(3) as the call never returned. It can be reproduced easily by starting any services that set User= option. For example starting the following test service leads to the same memory exhaustion: # systemctl cat test # /etc/systemd/system/test.service [Unit] Description=test [Service] User=1000 Type=oneshot ExecStart=/usr/bin/sleep infinity -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c8 --- Comment #8 from Franck Bui <fbui@suse.com> --- Also removing the NIS entry "+:::" in /etc/group workarounds the issue. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c9 --- Comment #9 from Franck Bui <fbui@suse.com> --- I can't reproduce the issue with getent(1). Indeed doing: # getent initgroups mcgrof # getent group works as expected. So it seems that the problem only happens in the context of PID1. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c10 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS CC| |kukuk@suse.com Flags| |needinfo?(kukuk@suse.com) --- Comment #10 from Franck Bui <fbui@suse.com> --- Thorsten, since you're libnss_nis maintainer, does that ring you a bell or can you think of any reason why initgroups(3) would loop forever and consume all memory when it's called in PID1 context ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c11 Thorsten Kukuk <kukuk@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(kukuk@suse.com) | --- Comment #11 from Thorsten Kukuk <kukuk@suse.com> --- (In reply to Franck Bui from comment #10)
Thorsten, since you're libnss_nis maintainer, does that ring you a bell or can you think of any reason why initgroups(3) would loop forever and consume all memory when it's called in PID1 context ?
From the backtrace it's in libtirpc, not libnss_nis.
I'm not aware of anything in PID1 context, the usual reasons for such a crash are, that NIS is needed to contact the NIS server, so typical endless recursion. Maybe systemd as PID1 does not allow access to everything needed? How does nsswitch.conf look like on the host? Does removing NIS from hosts, services, protocols and netgroup help? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c12 --- Comment #12 from Franck Bui <fbui@suse.com> --- (In reply to Thorsten Kukuk from comment #11)
I'm not aware of anything in PID1 context, the usual reasons for such a crash are, that NIS is needed to contact the NIS server, so typical endless recursion.
Well the lib is not supposed to crash so there's definitively a bug here.
Maybe systemd as PID1 does not allow access to everything needed?
I'm not aware of such restrictions.
How does nsswitch.conf look like on the host?
I'll attach it.
Does removing NIS from hosts, services, protocols and netgroup help?
Nope it doesn't most likely because "group" uses "compat". BTW you can have a look at the system, see (private) comment #1. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c13 --- Comment #13 from Franck Bui <fbui@suse.com> --- Created attachment 831792 --> http://bugzilla.suse.com/attachment.cgi?id=831792&action=edit nsswitch.conf -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c14 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |thomas.blume@suse.com --- Comment #14 from Franck Bui <fbui@suse.com> --- After some more debugging I finally found that the bug is essentially the same as the one reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1705641 A more interesting discussion can be found here https://github.com/systemd/systemd/pull/13359 but unfortunately it led to nowhere. Adding Thomas in Cc as he's the libtirpc maintainer. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c15 Thomas Blume <thomas.blume@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(kukuk@suse.com) --- Comment #15 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Franck Bui from comment #14)
After some more debugging I finally found that the bug is essentially the same as the one reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1705641
A more interesting discussion can be found here https://github.com/systemd/systemd/pull/13359 but unfortunately it led to nowhere.
Adding Thomas in Cc as he's the libtirpc maintainer.
Indeed: return (tbsize = (int)rl.rlim_max); seems to be a bit excessive. But I'm not sure how else to determine a good value. Thorsten, would be changing this to the soft limit appropriate, e.g: return (tbsize = (int)rl.rlim_cur); ? Btw. the RedHat bugzilla indicates that enabling nscd would also fix the excessive memory consumption. What do you think? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c16 --- Comment #16 from Franck Bui <fbui@suse.com> --- (In reply to Thomas Blume from comment #15)
return (tbsize = (int)rl.rlim_max);
seems to be a bit excessive. But I'm not sure how else to determine a good value. Thorsten, would be changing this to the soft limit appropriate, e.g:
return (tbsize = (int)rl.rlim_cur);
?
Indeed that was my thinking too and it's actually how glibc implements getdtablesize(3). It looks like there might be a bug here therefore I would suggest to send a patch upstream that make the code use the soft limit. But that won't be sufficient unfortunately because PID1 initializes the rlimits of its child *after* calling initgroups(3). Therefore the soft limit will still be the one used by PID1 which is still equal to the hard limit. I'll send a patch to upstream to attempt to fix it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c17 Thorsten Kukuk <kukuk@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(kukuk@suse.com) | --- Comment #17 from Thorsten Kukuk <kukuk@suse.com> --- (In reply to Thomas Blume from comment #15)
(In reply to Franck Bui from comment #14)
After some more debugging I finally found that the bug is essentially the same as the one reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1705641
A more interesting discussion can be found here https://github.com/systemd/systemd/pull/13359 but unfortunately it led to nowhere.
Adding Thomas in Cc as he's the libtirpc maintainer.
Indeed:
return (tbsize = (int)rl.rlim_max);
seems to be a bit excessive. But I'm not sure how else to determine a good value. Thorsten, would be changing this to the soft limit appropriate, e.g:
return (tbsize = (int)rl.rlim_cur);
?
As far as I understand the code, no, this will not work. To be fast they are doing something which is very fast, but also very memory consuming: have a table for all possible file descriptors to lookup the data instead of going through a list of existing file descriptors. If you set the limit lower than possible, you could get a out of array access. The code was written at a time, when 1024 file handles was the maximum possible, not for that big numbers as of today. I would discuss this on the tirpc mailing list. The whole code needs to be rewritten to be less memory consuming.
Btw. the RedHat bugzilla indicates that enabling nscd would also fix the excessive memory consumption.
No, I tried that already yesterday: if the nscd cache is full (somebody did run "getent group" before, it works. If the nscd cache is empty, I saw the same crash. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c18 --- Comment #18 from Thorsten Kukuk <kukuk@suse.com> --- (In reply to Franck Bui from comment #16)
Indeed that was my thinking too and it's actually how glibc implements getdtablesize(3). It looks like there might be a bug here therefore I would suggest to send a patch upstream that make the code use the soft limit.
glibc is using sunrpc, not ti-rpc. You cannot compare that functions, as the code calling this function is different and use it different. With the problem, that connections could be ignored if the number of the fd does not fit. One common problem since the day Linux supports more than 1024 file descriptors for a process. But you are right that TI-RPC needs to be adjusted, as written, the code calling __rpc_dtbsize() needs to be rewritten. Best to not allocate that huge array, but in every case it needs some sanity checks if the filed escriptor is inside the table or outside. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c19 --- Comment #19 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Thorsten Kukuk from comment #17)
(In reply to Thomas Blume from comment #15)
(In reply to Franck Bui from comment #14)
After some more debugging I finally found that the bug is essentially the same as the one reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1705641
A more interesting discussion can be found here https://github.com/systemd/systemd/pull/13359 but unfortunately it led to nowhere.
Adding Thomas in Cc as he's the libtirpc maintainer.
Indeed:
return (tbsize = (int)rl.rlim_max);
seems to be a bit excessive. But I'm not sure how else to determine a good value. Thorsten, would be changing this to the soft limit appropriate, e.g:
return (tbsize = (int)rl.rlim_cur);
?
As far as I understand the code, no, this will not work. To be fast they are doing something which is very fast, but also very memory consuming: have a table for all possible file descriptors to lookup the data instead of going through a list of existing file descriptors.
Let us think proactively as well, if this was done, was it inspired by another codebase? Who else might be doing similar practice?
If you set the limit lower than possible, you could get a out of array access. The code was written at a time, when 1024 file handles was the maximum possible, not for that big numbers as of today.
chivo:~/:[0]# cat /proc/sys/fs/file-max 9223372036854775807
I would discuss this on the tirpc mailing list. The whole code needs to be rewritten to be less memory consuming.
And probably backported... down to what release? :) A puzzle I don't see yet resolved is that this issue triggers only on one system, and not for another twin system with the same software/hardware. Any ideas on that? ie, I am not sure if this issue is easily reproducible on all systems. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c20 --- Comment #20 from Thorsten Kukuk <kukuk@suse.com> --- (In reply to Luis Chamberlain from comment #19)
(In reply to Thorsten Kukuk from comment #17)
As far as I understand the code, no, this will not work. To be fast they are doing something which is very fast, but also very memory consuming: have a table for all possible file descriptors to lookup the data instead of going through a list of existing file descriptors.
Let us think proactively as well, if this was done, was it inspired by another codebase? Who else might be doing similar practice?
Of course, this was common best practice 25 years ago if you needed performance.
If you set the limit lower than possible, you could get a out of array access. The code was written at a time, when 1024 file handles was the maximum possible, not for that big numbers as of today.
chivo:~/:[0]# cat /proc/sys/fs/file-max 9223372036854775807
That's why this is no good practice today anymore and needs to be rewritten. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c21 --- Comment #21 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #19)
A puzzle I don't see yet resolved is that this issue triggers only on one system, and not for another twin system with the same software/hardware. Any ideas on that?
The issue won't show up if nscd is used and the nscd cache is full. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c22 --- Comment #22 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Thomas Blume from comment #21)
(In reply to Luis Chamberlain from comment #19)
A puzzle I don't see yet resolved is that this issue triggers only on one system, and not for another twin system with the same software/hardware. Any ideas on that?
The issue won't show up if nscd is used and the nscd cache is full.
Chivo is the name of the affected system, tuctuc is the name of the system on which this issue has not appeared yet. Both have the same release of Tumblewedc installed. nscd was disabled on both systems upon a fresh install, and ncsd remains disabled on tuctuc. Any other theory? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c23 --- Comment #23 from Luis Chamberlain <luis.chamberlain@suse.com> --- Also, if the culprit is libtirpc function __rpc_dtbsize() shouldn't we be able to proof of concept this easily? And wouldn't this issue be more visible and easier to reproduce elsewhere? Other binaries that depend on libtirpc, but I am not sure if they use __rpc_dtbsize(). /bin/ rpcinfo /usr/bin/ cifsdd dbwrap_tool guestmount libguestfs-test-tool mailq net newaliases nmblookup oLschema2ldif pdbedit profiles regdiff regpatch regshell regtree rpcclient rsmtp rsmtp samba-regedit sharesec smbcacls /usr/sbin/ automount eventlogadm exim libvirtd nmbd rpc.gssd rpc.mountd rpc.statd rpc.svcgssd sendmail showmount sm-notify smbd virtinterfaced virtlockd virtlogd virtlxcd virtnetworkd virtnodedevd virtnwfilterd virtproxyd virtqemud virtsecretd virtstoraged virtxend yp_dump_binding ypbind yppoll ypset yptest -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c24 --- Comment #24 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Thorsten Kukuk from comment #20)
(In reply to Luis Chamberlain from comment #19)
(In reply to Thorsten Kukuk from comment #17)
If you set the limit lower than possible, you could get a out of array access. The code was written at a time, when 1024 file handles was the maximum possible, not for that big numbers as of today.
chivo:~/:[0]# cat /proc/sys/fs/file-max 9223372036854775807
That's why this is no good practice today anymore and needs to be rewritten.
But the above is file-max, I think __rpc_dtbsize() gets the RLIMIT_NOFILE, which is different. FWIW: mcgrof@chivo ~ $ sudo cat /proc/sys/fs/nr_open 1073741816 mcgrof@chivo ~ $ python -c "import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))" (1024, 524288) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c25 --- Comment #25 from Franck Bui <fbui@suse.com> --- (In reply to Luis Chamberlain from comment #23)
Also, if the culprit is libtirpc function __rpc_dtbsize() shouldn't we be able to proof of concept this easily? And wouldn't this issue be more visible and easier to reproduce elsewhere?
See previous comments, it's due to the fact __rpc_dtbsize() is called with rlimits inherited from PID1. The rlimits of started service are initialized *after* the call to initgroups(3). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c26 --- Comment #26 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Thorsten Kukuk from comment #17)
have a table for all possible file descriptors to lookup the data instead of going through a list of existing file descriptors. If you set the limit lower than possible, you could get a out of array access. The code was written at a time, when 1024 file handles was the maximum possible, not for that big numbers as of today.
I would discuss this on the tirpc mailing list. The whole code needs to be rewritten to be less memory consuming.
Looking for some more background I've found this: https://www.shrubbery.net/solaris9ab/SUNWdev/ONCDG/p27.html --> The __rpc_dtbsize() function calls the getrlimit() function to determine the maximum value that the system can assign to a newly created file descriptor. The result is cached for efficiency. --< If that's the purpose I would assume that: return (tbsize = (int)rl.rlim_max); is correct. So maybe we just need to subtract something from dtbsize in case it has an insane value? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c27 --- Comment #27 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Franck Bui from comment #25)
(In reply to Luis Chamberlain from comment #23)
Also, if the culprit is libtirpc function __rpc_dtbsize() shouldn't we be able to proof of concept this easily? And wouldn't this issue be more visible and easier to reproduce elsewhere?
See previous comments, it's due to the fact __rpc_dtbsize() is called with rlimits inherited from PID1.
OK.
The rlimits of started service are initialized *after* the call to initgroups(3).
Are you saying that there is a race between the use of __rpc_dtbsize() and its initialized values? What explains that this is not reproducible on a twin system, with the same software? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c28 --- Comment #28 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Thomas Blume from comment #26)
Looking for some more background I've found this:
https://www.shrubbery.net/solaris9ab/SUNWdev/ONCDG/p27.html
--> The __rpc_dtbsize() function calls the getrlimit() function to determine the maximum value that the system can assign to a newly created file descriptor. The result is cached for efficiency. --<
If that's the purpose I would assume that:
return (tbsize = (int)rl.rlim_max);
is correct. So maybe we just need to subtract something from dtbsize in case it has an insane value?
There is already a patch to address an integer overflow in dtbsize, see: https://bugzilla.redhat.com/show_bug.cgi?id=1600284 (libtirpc git commit 56b780e61ed4bae8b728a600fc5ac8052d0d3582). --> + if ( (size_t) dtbsize > SIZE_MAX/sizeof(cond_t)) { + mutex_unlock(&clnt_fd_lock); + thr_sigsetmask(SIG_SETMASK, &(mask), NULL); + ce->cf_stat = RPC_SYSTEMERROR; + ce->cf_error.re_errno = EOVERFLOW; + goto err; + } --< Maybe we could add a variable for the available memory to this calculation? I guess an out of array access becomes irrelevant when the system runs out of memory. Thorsten, I'm just collection ideas before discussion that upstream, would be good if you could give your opinion. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c29 --- Comment #29 from Thorsten Kukuk <kukuk@suse.com> --- (In reply to Thomas Blume from comment #28)
Thorsten, I'm just collection ideas before discussion that upstream, would be good if you could give your opinion.
The above are all insane hacks to workaround a bad design and creating more problems than they solve. In my opinion, there are only two possible solutions: 1. Limit the number of possible fd's and add everywhere a check in the tirpc code, that we don't access data behind the array. Quick and easy doable, but does not solve of the insane amount of memory usage and adds additional the problem, that connections could go lost if the filedescriptor number is getting to high. So the same problem was we had with glibc. So we minimize the original problem by adding new problems. 2. I seldom agree with Poettering, but here I do: we need to fix the tirpc implementation. Don't allocate an array for all possible fd numbers, but instead use a linked list or something similar. Yes, this will be slower, but if I look at the usage, I don't think this is a real problem on todays CPUs. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c30 --- Comment #30 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Thorsten Kukuk from comment #29)
2. I seldom agree with Poettering, but here I do: we need to fix the tirpc implementation. Don't allocate an array for all possible fd numbers, but instead use a linked list or something similar. Yes, this will be slower, but if I look at the usage, I don't think this is a real problem on todays CPUs.
Thanks Thorsten, interestingly, even the original patch (from 2007!) states that the implementation is silly: --> +/* + * This machinery implements per-fd locks for MT-safety. It is not + * sufficient to do per-CLIENT handle locks for MT-safety because a + * user may create more than one CLIENT handle with the same fd behind + * it. Therfore, we allocate an array of flags (vc_fd_locks), protected + * by the clnt_fd_lock mutex, and an array (vc_cv) of condition variables + * similarly protected. Vc_fd_lock[fd] == 1 => a call is active on some + * CLIENT handle created for that fd. + * The current implementation holds locks across the entire RPC and reply. + * Yes, this is silly, and as soon as this code is proven to work, this + * should be the first thing fixed. One step at a time. + */ --< -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c31 --- Comment #31 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Luis Chamberlain from comment #27)
(In reply to Franck Bui from comment #25)
(In reply to Luis Chamberlain from comment #23)
Also, if the culprit is libtirpc function __rpc_dtbsize() shouldn't we be able to proof of concept this easily? And wouldn't this issue be more visible and easier to reproduce elsewhere?
See previous comments, it's due to the fact __rpc_dtbsize() is called with rlimits inherited from PID1.
OK.
The rlimits of started service are initialized *after* the call to initgroups(3).
Are you saying that there is a race between the use of __rpc_dtbsize() and its initialized values?
What explains that this is not reproducible on a twin system, with the same software?
This goes unanswered. Why is this only occurring on *some* systems, not all? Why can we not easily reproduce with other test cases with 100% certainty? Also, is there a work around I can use in the meantime, to move on with life in the meantime with this system? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|systemd-maintainers@suse.de |thomas.blume@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c32 Thomas Blume <thomas.blume@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |luis.chamberlain@suse.com Flags| |needinfo?(luis.chamberlain@ | |suse.com) --- Comment #32 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #31)
What explains that this is not reproducible on a twin system, with the same software?
This goes unanswered. Why is this only occurring on *some* systems, not all? Why can we not easily reproduce with other test cases with 100% certainty?
I have no answer yet and like to get access to tuctuc in order to compare it to chivo. Is that possible?
Also, is there a work around I can use in the meantime, to move on with life in the meantime with this system?
I guess the best for now is to reset LimitNOFILE to the old value, e.g.: DefaultLimitNOFILE=1024 in /etc/system/system.conf -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c33 --- Comment #33 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Thomas Blume from comment #32)
(In reply to Luis Chamberlain from comment #31)
What explains that this is not reproducible on a twin system, with the same software?
This goes unanswered. Why is this only occurring on *some* systems, not all? Why can we not easily reproduce with other test cases with 100% certainty?
I have no answer yet and like to get access to tuctuc in order to compare it to chivo. Is that possible?
Absolutely but please note that tuctuc is actively being used by myself for development purposes right now, so I would prefer if not reboots are done. The credentials are the same, the IP address is the same as chivo but the last octet ends with 196. I've enabled the same user into the wheel group so it has sudo privileges.
Also, is there a work around I can use in the meantime, to move on with life in the meantime with this system?
I guess the best for now is to reset LimitNOFILE to the old value, e.g.:
DefaultLimitNOFILE=1024
in /etc/system/system.conf
I'll try this. How do I apply this without restarting the system? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c34 Thomas Blume <thomas.blume@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(luis.chamberlain@ | |suse.com) | --- Comment #34 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #33)
Absolutely but please note that tuctuc is actively being used by myself for development purposes right now, so I would prefer if not reboots are done. The credentials are the same, the IP address is the same as chivo but the last octet ends with 196. I've enabled the same user into the wheel group so it has sudo privileges.
ok, thanks
DefaultLimitNOFILE=1024
in /etc/system/system.conf
I'll try this. How do I apply this without restarting the system?
Please try: systemctl daemon-reexec If that doesn't work, I guess a reboot is needed, since it is the system daemon (e.g. pid 1) itself. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c35 --- Comment #35 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Thomas Blume from comment #34)
(In reply to Luis Chamberlain from comment #33)
DefaultLimitNOFILE=1024
in /etc/system/system.conf
I'll try this.
This is not a workaround -- the OOM still occurs. I had restarted the system, and the OOM still triggers *every* single time I ssh in. mcgrof@chivo ~ $ sudo grep DefaultLimitNOFILE /etc/systemd/system.conf #DefaultLimitNOFILE=1024:524288 DefaultLimitNOFILE=1024 I had logged in twice: mcgrof@chivo ~ $ sudo journalctl -k | grep "Out of memory" Mar 17 18:07:58 chivo kernel: Out of memory: Killed process 5995 ((systemd)) total-vm:54694764kB, anon-rss:31910968kB, file-rss:2172kB, shmem-rss:0kB, UID:0 pgtables:64580kB oom_score_adj:0 Mar 17 18:08:22 chivo kernel: Out of memory: Killed process 6060 ((systemd)) total-vm:54694764kB, anon-rss:31928396kB, file-rss:2144kB, shmem-rss:0kB, UID:0 pgtables:64588kB oom_score_adj:0
How do I apply this without restarting the system?
Please try:
systemctl daemon-reexec
If that doesn't work, I guess a reboot is needed, since it is the system daemon (e.g. pid 1) itself.
Thanks, to be safe I just rebooted. I'll keep the above in mind for the future. Is there a way to visualize the values at run time which are in effect? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c36 --- Comment #36 from Luis Chamberlain <luis.chamberlain@suse.com> --- I had to reboot the tuctuc system, now the oom is triggering every time on the system upon an ssh login. I didn't do anything other than a reboot. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c37 Thomas Blume <thomas.blume@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(luis.chamberlain@ | |suse.com) --- Comment #37 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #36)
I had to reboot the tuctuc system, now the oom is triggering every time on the system upon an ssh login. I didn't do anything other than a reboot.
I took a look in the logs on tuctuc and all ooms I see, were killing the session of user1000. loginctl shows 2 session of user1000 but systemd-cgls doesn't show an user@1000.service nor an init scope for this user. On chivo it is almost the same, most ooms are bound to the user1000 session, only once there was an oom for unbound-anchor.service. user1000 on chivo also doesn't have an user@1000.service and an init scope. I'm not sure whether this is the result of the oom kill or a special setup for this user. Is user1000 supposed to be a normal user that goes via pam login? However, it seems that the limit for the system daemon is fine and the problem is with the user daemon (systemd --user). In that case the workaround given was wrong indeed. Please revert the changes in /etc/systemd/system.conf and edit /usr/lib/systemd/system/user-.slice.d/10-defaults.conf instead. There please set MemoryHigh and MemoryMax limits according to the documentation in the systemd.resource-control manpage. Please report whether this fixes the ooms. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 Thorsten Kukuk <kukuk@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC|kukuk@suse.com | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c38 --- Comment #38 from Franck Bui <fbui@suse.com> --- (In reply to Luis Chamberlain from comment #31)
Also, is there a work around I can use in the meantime, to move on with life in the meantime with this system?
Sorry for the late reply. At least there's no simple workaround with systemd. As written in comment #25, the RLIMIT_NOFILE is directly inherited from PID1 when a service is being created and before initgroups(3) is called. Only after the service specific rlimits are initialized. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c39 Luis Chamberlain <luis.chamberlain@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(luis.chamberlain@ | |suse.com) | --- Comment #39 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Thomas Blume from comment #37)
(In reply to Luis Chamberlain from comment #36)
I had to reboot the tuctuc system, now the oom is triggering every time on the system upon an ssh login. I didn't do anything other than a reboot.
I took a look in the logs on tuctuc and all ooms I see, were killing the session of user1000. loginctl shows 2 session of user1000 but systemd-cgls doesn't show an user@1000.service nor an init scope for this user. On chivo it is almost the same, most ooms are bound to the user1000 session, only once there was an oom for unbound-anchor.service. user1000 on chivo also doesn't have an user@1000.service and an init scope.
I'm not sure whether this is the result of the oom kill or a special setup for this user.
I haven't done anything custom.
Is user1000 supposed to be a normal user that goes via pam login?
Yes, no NIS stuff.
However, it seems that the limit for the system daemon is fine and the problem is with the user daemon (systemd --user). In that case the workaround given was wrong indeed.
Please revert the changes in /etc/systemd/system.conf and edit /usr/lib/systemd/system/user-.slice.d/10-defaults.conf
instead. There please set MemoryHigh and MemoryMax limits according to the documentation in the systemd.resource-control manpage.
Please report whether this fixes the ooms.
Nope, the full 30 GiB *and* then swap are eaten up and then the OOM triggers. I rebooted the system after making the changes and trying. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c40 --- Comment #40 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #39)
However, it seems that the limit for the system daemon is fine and the problem is with the user daemon (systemd --user). In that case the workaround given was wrong indeed.
Please revert the changes in /etc/systemd/system.conf and edit /usr/lib/systemd/system/user-.slice.d/10-defaults.conf
instead. There please set MemoryHigh and MemoryMax limits according to the documentation in the systemd.resource-control manpage.
Please report whether this fixes the ooms.
Nope, the full 30 GiB *and* then swap are eaten up and then the OOM triggers. I rebooted the system after making the changes and trying.
Unfortunately, it seems that the session cleanup was successfull this time. I don't see a user@1000.slice on chivo anymore, so I cannot verify the settings. :( Could you please retry the login on chivo? Even better would be if you create a local (non root) user for me so that I can test myself and don't need to nag you all the time. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c41 --- Comment #41 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Thomas Blume from comment #40)
(In reply to Luis Chamberlain from comment #39)
However, it seems that the limit for the system daemon is fine and the problem is with the user daemon (systemd --user). In that case the workaround given was wrong indeed.
Please revert the changes in /etc/systemd/system.conf and edit /usr/lib/systemd/system/user-.slice.d/10-defaults.conf
instead. There please set MemoryHigh and MemoryMax limits according to the documentation in the systemd.resource-control manpage.
Please report whether this fixes the ooms.
Nope, the full 30 GiB *and* then swap are eaten up and then the OOM triggers. I rebooted the system after making the changes and trying.
Unfortunately, it seems that the session cleanup was successfull this time. I don't see a user@1000.slice on chivo anymore, so I cannot verify the settings. :( Could you please retry the login on chivo?
Yes is still triggers.
Even better would be if you create a local (non root) user for me so that I can test myself and don't need to nag you all the time.
I did that early on, on February 29 I provided credentials on private comment #1. Let me know if you have issues logging in. The login will take a while, 20-30 seconds because the system is OOM'ing while you log in. It will succeed after the OOM and will let you know after. As I noted earlier as well the best way to visualize the OOM is to use htop: sudo htop --sort-key PERCENT_MEM -d 1 The user test is part of the wheel group and sudo is set with: %wheel ALL=(ALL) NOPASSWD: ALL The credentials apply to both systems, chivo and tuctuc. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c42 Thomas Blume <thomas.blume@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(fbui@suse.com) --- Comment #42 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #41)
I did that early on, on February 29 I provided credentials on private comment #1. Let me know if you have issues logging in. The login will take a while, 20-30 seconds because the system is OOM'ing while you log in. It will succeed after the OOM and will let you know after. As I noted earlier as well the best way to visualize the OOM is to use htop:
sudo htop --sort-key PERCENT_MEM -d 1
The user test is part of the wheel group and sudo is set with:
%wheel ALL=(ALL) NOPASSWD: ALL
The credentials apply to both systems, chivo and tuctuc.
Oops, sorry, I've overlooked your testuser. Now it becomes interesting, systemd-analyze dump shows: -> Unit user@1001.service: Description: User Manager for UID 1001 Instance: 1001 Unit Load State: loaded Unit Active State: failed State Change Timestamp: Thu 2020-03-19 09:12:55 CET Inactive Exit Timestamp: Thu 2020-03-19 09:12:39 CET Active Enter Timestamp: n/a Active Exit Timestamp: n/a Inactive Enter Timestamp: Thu 2020-03-19 09:12:55 CET May GC: no Need Daemon Reload: no Transient: no Perpetual: no Garbage Collection Mode: inactive Slice: user-1001.slice [....] -> ExecStart: Command Line: /usr/lib/systemd/systemd --user PID: 10907 Start Timestamp: Thu 2020-03-19 09:12:39 CET Exit Timestamp: Thu 2020-03-19 09:12:55 CET Exit Code: killed Exit Status: 9 CPUAccounting: no IOAccounting: no BlockIOAccounting: no MemoryAccounting: yes TasksAccounting: yes IPAccounting: no CPUWeight: 18446744073709551615 StartupCPUWeight: 18446744073709551615 CPUShares: 18446744073709551615 StartupCPUShares: 18446744073709551615 CPUQuotaPerSecSec: infinity CPUQuotaPeriodSec: infinity AllowedCPUs: AllowedMemoryNodes: IOWeight: 18446744073709551615 StartupIOWeight: 18446744073709551615 BlockIOWeight: 18446744073709551615 StartupBlockIOWeight: 18446744073709551615 DefaultMemoryMin: 0 DefaultMemoryLow: 0 MemoryMin: 0 MemoryLow: 0 MemoryHigh: 18446744073709551615 MemoryMax: 18446744073709551615 [...] TasksMax: 18446744073709551615 [...] --< whereas the slice shows: --> -> Unit user-1001.slice: Description: User Slice of UID 1001 [...] MemoryHigh: 3221225472 MemoryMax: 4294967296 [...] TasksMax: 10813 [...] --< So, the systemd user instance started in user-1001.slice ignores the limits of the slice for memory as well as for TasksMax. I guess this is the root cause of the issue. Franck, since the systemd user instance is started below the user-1001.slice I would expect that it respects the memory limit of the slice. Any idea why it doesn't? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c43 --- Comment #43 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Thomas Blume from comment #42)
So, the systemd user instance started in user-1001.slice ignores the limits of the slice for memory as well as for TasksMax. I guess this is the root cause of the issue.
Is the theory here still that the real root cause is the tirpc allocations for each possible file descriptor, and also, now we are observing the user instance ignore user slice limits on memory which *should* in theory contain bogus / buggy allocations such as the tirp issue? If so then should we really open two bugs ? One for libtirpc and one for systemd?
Franck, since the systemd user instance is started below the user-1001.slice I would expect that it respects the memory limit of the slice. Any idea why it doesn't?
Any idea if we can work around this by forcing this to a low limit somehow? And -- what explains why we didn't observe this all the time? I mean even though tuctuc no longer exhibits good behaviour, I'm sure any new install of Tumbleweed shouldn't be OOM'ing. So a new install should demo a case where this is not being observed. So -- *why* is this only observed on some systems but not all? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c44 --- Comment #44 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #43)
(In reply to Thomas Blume from comment #42)
So, the systemd user instance started in user-1001.slice ignores the limits of the slice for memory as well as for TasksMax. I guess this is the root cause of the issue.
Is the theory here still that the real root cause is the tirpc allocations for each possible file descriptor, and also, now we are observing the user instance ignore user slice limits on memory which *should* in theory contain bogus / buggy allocations such as the tirp issue?
If so then should we really open two bugs ? One for libtirpc and one for systemd?
I'm not sure anymore whether there is really a bug in libtirpc. I've found that it was implemented in 2007 that way by purpose. So, all current SLES versions carry a libtirpc with this code. It was never an issue with the correct limits in place. So, strictly speaking the libtirpc code is not really good, but not a bug either. Improving this code would rather qualify for a feature request than for a bug. On the other hand systemd not respecting the limits of the user slice is clearly a bug. That must not happen.
Franck, since the systemd user instance is started below the user-1001.slice I would expect that it respects the memory limit of the slice. Any idea why it doesn't?
Any idea if we can work around this by forcing this to a low limit somehow?
And -- what explains why we didn't observe this all the time? I mean even though tuctuc no longer exhibits good behaviour, I'm sure any new install of Tumbleweed shouldn't be OOM'ing. So a new install should demo a case where this is not being observed. So -- *why* is this only observed on some systems but not all?
Yes and even more strange is that it doesn't happen for the root user (user0). Since it also lives in the user slice, I would expect the same issue. I can see that even the systemd instance started for user0 doesn't respect the slice limits: --> -> Unit user@0.service: Description: User Manager for UID 0 Instance: 0 Unit Load State: loaded [...] Command Line: /usr/lib/systemd/systemd --user PID: 14739 Start Timestamp: Thu 2020-03-19 18:04:05 CET Status Text: Startup finished in 107ms. CPUAccounting: no IOAccounting: no BlockIOAccounting: no MemoryAccounting: yes TasksAccounting: yes IPAccounting: no CPUWeight: 18446744073709551615 StartupCPUWeight: 18446744073709551615 CPUShares: 18446744073709551615 StartupCPUShares: 18446744073709551615 CPUQuotaPerSecSec: infinity CPUQuotaPeriodSec: infinity AllowedCPUs: AllowedMemoryNodes: IOWeight: 18446744073709551615 StartupIOWeight: 18446744073709551615 BlockIOWeight: 18446744073709551615 StartupBlockIOWeight: 18446744073709551615 DefaultMemoryMin: 0 DefaultMemoryLow: 0 MemoryMin: 0 MemoryLow: 0 MemoryHigh: 18446744073709551615 MemoryMax: 18446744073709551615 MemorySwapMax: 18446744073709551615 MemoryLimit: 18446744073709551615 TasksMax: 18446744073709551615 --< Still, the login with this user doesn't create an oom. I have currently no explanation why. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c45 --- Comment #45 from Luis Chamberlain <luis.chamberlain@suse.com> --- FWIW -- I upgraded to Tumbleweed 20200314 on the tuctuc system and the issue is still present. I had to upgrade to some other development work on the system. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c46 --- Comment #46 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #45)
FWIW -- I upgraded to Tumbleweed 20200314 on the tuctuc system and the issue is still present. I had to upgrade to some other development work on the system.
I've tried on a qemu setup: --> g18:~ # cat /etc/os-release NAME="openSUSE Tumbleweed" # VERSION="20200314" ID="opensuse-tumbleweed" ID_LIKE="opensuse suse" VERSION_ID="20200314" --< with a way lower memory: --> g18:~ # free -m total used free shared buff/cache available Mem: 1977 162 1475 0 339 1659 Swap: 1281 0 1281 --< and couldn't reproduce the issue: --> Control group /: -.slice ├─1021 bpfilter_umh ├─user.slice │ ├─user-0.slice │ │ ├─session-1.scope │ │ │ ├─3023 sshd: root@pts/0 │ │ │ ├─3048 -bash │ │ │ ├─4854 systemd-cgls │ │ │ └─4855 less │ │ └─user@0.service … │ │ └─init.scope │ │ ├─3029 /usr/lib/systemd/systemd --user │ │ └─3030 (sd-pam) │ └─user-1000.slice │ ├─user@1000.service … │ │ └─init.scope │ │ ├─4610 /usr/lib/systemd/systemd --user │ │ └─4611 (sd-pam) │ └─session-2.scope │ ├─4592 sshd: tslokal [priv] │ ├─4617 sshd: tslokal@pts/1 │ └─4618 -bash [...] # ps -o pid,user,rss,vsz 4610 PID USER RSS VSZ 4610 tslokal 10400 16052 --< You can see ps 4610 is consuming about 16MB, which is just fine for 2G of RAM. Maybe it is bound to big memory machines? Could you give it a try rebooting one of your machines with reducing the physical memory size using the mem kernel parameter? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c47 --- Comment #47 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Thomas Blume from comment #46)
You can see ps 4610 is consuming about 16MB, which is just fine for 2G of RAM.
Maybe it is bound to big memory machines? Could you give it a try rebooting one of your machines with reducing the physical memory size using the mem kernel parameter?
I enforced mem=2G and the oom still triggers. I left the system booted with 2G , you can feel free to edit as you see fit. You do this by editing /etc/default/grub and then modifying the line that has the existing mem=2G line, which is GRUB_CMDLINE_LINUX_DEFAULT. To make it take effect on next boot, run: sudo update-bootloader mcgrof@chivo ~ $ free -h total used free shared buff/cache available Mem: 1.6Gi 638Mi 915Mi 0.0Ki 89Mi 885Mi Swap: 1.0Gi 35Mi 988Mi mcgrof@chivo ~ $ sudo dmesg| grep oom [ 2465.363645] (systemd) invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 [ 2465.705200] oom_kill_process.cold+0xb/0x10 [ 2468.573932] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 2471.181285] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/init.scope,task=(systemd),pid=2376,uid=0 [ 2471.299473] Out of memory: Killed process 2376 ((systemd)) total-vm:54694760kB, anon-rss:952556kB, file-rss:2200kB, shmem-rss:0kB, UID:0 pgtables:3904kB oom_score_adj:0 [ 2471.483078] oom_reaper: reaped process 2376 ((systemd)), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB mcgrof@chivo ~ $ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-5.5.4-1-default root=UUID=22cf9d5d-e128-41a2-8311-4f735a1d1352 console=tty0 console=ttyS0,57600 sysrq_always_enabled panic=100 ignore_loglevel unknown_nmi_panic resume=/dev/disk/by-id/scsi-3300143804052b801-part2 mem=2G mitigations=auto crashkernel=512M-2G:128M,2G-64G:256M,64G-:512M -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c48 --- Comment #48 from Luis Chamberlain <luis.chamberlain@suse.com> --- Note: I have a work around for now thanks to Anthony Iliopoulos, his suggestion was to disable systemd from pam: --- /etc/pam.d/common-session-pc.old 2020-03-25 13:42:38.404526416 +0100 +++ /etc/pam.d/common-session-pc 2020-03-25 13:42:47.308353223 +0100 @@ -28,7 +28,7 @@ # at the start and end of sessions of *any* kind (both interactive and # non-interactive # -session optional pam_systemd.so debug +#session optional pam_systemd.so debug session required pam_limits.so session required pam_unix.so try_first_pass session optional pam_umask.so I'll also note that this issue is *still* present if you don't do the above, but just disable NIS. I disabled NIS by commenting out /etc/yp.conf and stopping ypserv.service and then disabling it. The only effective way to work around this is disabling pam_Systemd.so from pam from commom-session-pc as noted above. But without a system to reproduce, we have no way of debugging it further. Thomas, I'll leave chivo with the pam_systemd.so enabled, but I'll disable it on tuctuc. I'll keep using tuctuc for development, feel free to use chivo for any further testing / debugging. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c49 --- Comment #49 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Luis Chamberlain from comment #48)
Note: I have a work around for now thanks to Anthony Iliopoulos, his suggestion was to disable systemd from pam:
--- /etc/pam.d/common-session-pc.old 2020-03-25 13:42:38.404526416 +0100 +++ /etc/pam.d/common-session-pc 2020-03-25 13:42:47.308353223 +0100 @@ -28,7 +28,7 @@ # at the start and end of sessions of *any* kind (both interactive and # non-interactive # -session optional pam_systemd.so debug +#session optional pam_systemd.so debug session required pam_limits.so session required pam_unix.so try_first_pass session optional pam_umask.so
I'll also note that this issue is *still* present if you don't do the above, but just disable NIS. I disabled NIS by commenting out /etc/yp.conf and stopping ypserv.service and then disabling it. The only effective way to work around this is disabling pam_Systemd.so from pam from commom-session-pc as noted above.
But without a system to reproduce, we have no way of debugging it further.
Thomas, I'll leave chivo with the pam_systemd.so enabled, but I'll disable it on tuctuc. I'll keep using tuctuc for development, feel free to use chivo for any further testing / debugging.
Odd, so no, I'm now able to reproduce with pam_systemd.so disabled but ypbind.service enabled. The only way to really work around this is to disable pam_systemd.so from /etc/pam.d/common-session-pc as noted earlier but also running: systemctl stop ypbind.service systemctl disable ypbind.service -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c50 --- Comment #50 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Luis Chamberlain from comment #49)
(In reply to Luis Chamberlain from comment #48)
Note: I have a work around for now thanks to Anthony Iliopoulos, his suggestion was to disable systemd from pam:
--- /etc/pam.d/common-session-pc.old 2020-03-25 13:42:38.404526416 +0100 +++ /etc/pam.d/common-session-pc 2020-03-25 13:42:47.308353223 +0100 @@ -28,7 +28,7 @@ # at the start and end of sessions of *any* kind (both interactive and # non-interactive # -session optional pam_systemd.so debug +#session optional pam_systemd.so debug session required pam_limits.so session required pam_unix.so try_first_pass session optional pam_umask.so
I'll also note that this issue is *still* present if you don't do the above, but just disable NIS. I disabled NIS by commenting out /etc/yp.conf and stopping ypserv.service and then disabling it. The only effective way to work around this is disabling pam_Systemd.so from pam from commom-session-pc as noted above.
But without a system to reproduce, we have no way of debugging it further.
Thomas, I'll leave chivo with the pam_systemd.so enabled, but I'll disable it on tuctuc. I'll keep using tuctuc for development, feel free to use chivo for any further testing / debugging.
Odd, so no, I'm now able to reproduce with pam_systemd.so disabled but ypbind.service enabled. The only way to really work around this is to disable pam_systemd.so from /etc/pam.d/common-session-pc as noted earlier but also running:
systemctl stop ypbind.service systemctl disable ypbind.service
I confirm again: disabling ypbind.service but leaving pam_systemd.so enabled still triggers this on chivo. I am leaving the system there with 2G of ram set upon boot. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c51 --- Comment #51 from Franck Bui <fbui@suse.com> --- (In reply to Luis Chamberlain from comment #48)
Thomas, I'll leave chivo with the pam_systemd.so enabled,
Can I install a test package on chivo and reboot it ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c52 --- Comment #52 from Thomas Blume <thomas.blume@suse.com> --- In reply to Luis Chamberlain from comment #48)
Note: I have a work around for now thanks to Anthony Iliopoulos, his suggestion was to disable systemd from pam:
--- /etc/pam.d/common-session-pc.old 2020-03-25 13:42:38.404526416 +0100 +++ /etc/pam.d/common-session-pc 2020-03-25 13:42:47.308353223 +0100 @@ -28,7 +28,7 @@ # at the start and end of sessions of *any* kind (both interactive and # non-interactive # -session optional pam_systemd.so debug +#session optional pam_systemd.so debug session required pam_limits.so session required pam_unix.so try_first_pass session optional pam_umask.so
I'll also note that this issue is *still* present if you don't do the above, but just disable NIS. I disabled NIS by commenting out /etc/yp.conf and stopping ypserv.service and then disabling it. The only effective way to work around this is disabling pam_Systemd.so from pam from commom-session-pc as noted above.
But without a system to reproduce, we have no way of debugging it further.
Thomas, I'll leave chivo with the pam_systemd.so enabled, but I'll disable it on tuctuc. I'll keep using tuctuc for development, feel free to use chivo for any further testing / debugging.
Thanks Luis, actually Franck found a workaround and I've just verified it on chivo. Starting nscd suppresses the oom at test user login: --> chivo:~/:[0]# systemctl start nscd chivo:~/:[0]# systemctl status nscd ● nscd.service - Name Service Cache Daemon Loaded: loaded (/usr/lib/systemd/system/nscd.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2020-03-25 14:56:52 CET; 3min 38s ago [...] chivo:~/:[130]# systemd-cgls Control group /: -.slice ├─user.slice │ ├─user-1001.slice │ │ ├─session-18.scope │ │ │ ├─24546 sshd: test [priv] │ │ │ ├─24567 sshd: test@pts/1 │ │ │ └─24570 -bash │ │ └─user@1001.service … │ │ └─init.scope │ │ ├─24557 /usr/lib/systemd/systemd --user │ │ └─24560 (sd-pam) [...] chivo:~/:[0]# free -m total used free shared buff/cache available Mem: 1643 739 715 0 188 739 Swap: 1023 29 994 --< -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c53 --- Comment #53 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Franck Bui from comment #51)
(In reply to Luis Chamberlain from comment #48)
Thomas, I'll leave chivo with the pam_systemd.so enabled,
Can I install a test package on chivo and reboot it ?
Yes you can hammer on that system all you like until the issue can be root-caused. Do as you think is needed :) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c54 --- Comment #54 from Franck Bui <fbui@suse.com> --- (In reply to Luis Chamberlain from comment #53)
Yes you can hammer on that system all you like until the issue can be root-caused. Do as you think is needed :)
Are you kidding ? This issue has been understood since comment #14... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c55 --- Comment #55 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Franck Bui from comment #54)
(In reply to Luis Chamberlain from comment #53)
Yes you can hammer on that system all you like until the issue can be root-caused. Do as you think is needed :)
Are you kidding ?
This issue has been understood since comment #14...
On comment #44 Thomas mentioned he was not sure if libtirpc was the real issue here. So I'll defer to the systemd folks to decide. I'm just happy to make the system available to further diagnose. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c56 --- Comment #56 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #55)
(In reply to Franck Bui from comment #54)
(In reply to Luis Chamberlain from comment #53)
Yes you can hammer on that system all you like until the issue can be root-caused. Do as you think is needed :)
Are you kidding ?
This issue has been understood since comment #14...
On comment #44 Thomas mentioned he was not sure if libtirpc was the real issue here. So I'll defer to the systemd folks to decide. I'm just happy to make the system available to further diagnose.
I've created a debug version of libtirpc3 and did some testing. After some discussion with Franck, it turns out that, unlike any other daemon on the system, a systemd user daemon has neither user nor system slice limits. Instead it inherits the limits of the parent systemd daemon (e.g. pid1) which has, for file_open the limits of /proc/sys/fs/nr_open, e.g. no limits. I don't want to discuss here whether a systemd user daemon should respect slice limits, but I see that all other daemons do (at least the limits of system.slice). Therefore, I think there is no danger in limiting libtirpc to LimitNOFILE of the system slice. If that wouldn't be enough for libtirpc3, all daemons that use it and have the limits of system slice would have a problem. Franck, do you see any problem for the systemd user daemon with this approach? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c57 --- Comment #57 from Thomas Blume <thomas.blume@suse.com> --- Created attachment 835243 --> http://bugzilla.suse.com/attachment.cgi?id=835243&action=edit 0001-execute-set-rlimits-before-using-libtirpc-which-uses.patch The attached patch fixed it on chivo. Luis, I have built a testpackage with the latest systemd and this patch at: https://download.opensuse.org/repositories/home:/tsaupe:/branches:/Base:/Sys... It would be good if you could test this package too (probably on your other machine or on a new installation). Franck, if you have no objections, I would send this patch upstream. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c58 --- Comment #58 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Thomas Blume from comment #57)
Created attachment 835243 [details] 0001-execute-set-rlimits-before-using-libtirpc-which-uses.patch
The attached patch fixed it on chivo.
Luis, I have built a testpackage with the latest systemd and this patch at:
https://download.opensuse.org/repositories/home:/tsaupe:/branches:/Base:/ System:/systemd-bsc1165351/openSUSE_Factory/
It would be good if you could test this package too (probably on your other machine or on a new installation).
I have tested this on the other failing system and confirm that this fixes the issue, thanks! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c59 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(fbui@suse.com) | --- Comment #59 from Franck Bui <fbui@suse.com> --- (In reply to Thomas Blume from comment #57)
Franck, if you have no objections, I would send this patch upstream.
Sure, go ahead. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c60 --- Comment #60 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Franck Bui from comment #59)
(In reply to Thomas Blume from comment #57)
Franck, if you have no objections, I would send this patch upstream.
Sure, go ahead.
So I still am a bit perplexed about one thing. Why is this issue only being observed now? I checked the code lineage at least in systemd, and this seems to have existed for a long time now. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c61 --- Comment #61 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #60)
(In reply to Franck Bui from comment #59)
(In reply to Thomas Blume from comment #57)
Franck, if you have no objections, I would send this patch upstream.
Sure, go ahead.
So I still am a bit perplexed about one thing. Why is this issue only being observed now? I checked the code lineage at least in systemd, and this seems to have existed for a long time now.
IMHO, using the rlimits for the systemd user daemon in libtirpc has never worked, but that was hidden because the NOFILES for pid1 were also lower. The issue became visible after increasing the NOFILES for pid1. When having the rlimits for the systemd user dameon applied at he time of the libtirpc call, it starts working. This is what my patch does. However, it seems that upstream still doubts that this is a systemd issue, see: https://github.com/systemd/systemd/pull/15423 There is no question that the code in libtirpc also needs to be improved, but the trigger of this bug, at least in a standard tumbleweed installation, is that libirpc is using an rlimit that isn't applied yet. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c62 Jaime Caamaño Ruiz <jcaamano@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jcaamano@suse.com --- Comment #62 from Jaime Caamaño Ruiz <jcaamano@suse.com> --- I took this off Thomas' hands. I have not seen it specifically mentioned here, apologies if it was already known, but this seem to be ultimately caused by the bump of nr_open introduced in systemd 240 [1]. systemd inherits its nofile limit (hard & soft) from nr_open after this bump. Before this change, we would be running a nr_open of 1024*1024=1048576 but from 240 it is bumped to 1073741816. A possible workaround is to use the build option provided by systemd Dbump-proc-sys-fs-nr-open, to avoid this bump and remain on the previous value, until we have a permanent fix in libtirpc. Frank, what do you think of this? In the meantime, steved has applied a patch to use the soft limit instead of the hard limit, but this wont fix the issue for systemd PID1. I will let him know. [1] https://github.com/systemd/systemd/blob/1742aae2aa8cd33897250d6fcfbe10928e43... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 Jaime Caamaño Ruiz <jcaamano@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(fbui@suse.com) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c63 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(fbui@suse.com) | --- Comment #63 from Franck Bui <fbui@suse.com> --- (In reply to Jaime Caamaño Ruiz from comment #62)
In the meantime, steved has applied a patch to use the soft limit instead of the hard limit, but this wont fix the issue for systemd PID1. I will let him know.
Which issue for systemd are you talking about ? If the patch you mentioned is the one suggested since nearly the beginning of this report (comment #16) then this patch alone should be enough to fix this bug since AFAIK only libtirpc is affected when the nofile hard limit is bumped. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c64 --- Comment #64 from Jaime Caamaño Ruiz <jcaamano@suse.com> --- (In reply to Franck Bui from comment #63)
Which issue for systemd are you talking about ?
If the patch you mentioned is the one suggested since nearly the beginning of this report (comment #16) then this patch alone should be enough to fix this bug since AFAIK only libtirpc is affected when the nofile hard limit is bumped.
Yes, that is the patch. But note that the soft limit has also been bumped and also from comment #16:
But that won't be sufficient unfortunately because PID1 initializes the rlimits > of its child *after* calling initgroups(3). Therefore the soft limit will still > be the one used by PID1 which is still equal to the hard limit.
I understand that PID1 calls initgroups(3) with it's own soft limit, which has increased from 1048576 to 1073741816 since version 240. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 Anthony Iliopoulos <ailiopoulos@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC|ailiopoulos@suse.com | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c65 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(thomas.blume@suse | |.com) --- Comment #65 from Franck Bui <fbui@suse.com> --- (In reply to Jaime Caamaño Ruiz from comment #64)
I understand that PID1 calls initgroups(3) with it's own soft limit, which has increased from 1048576 to 1073741816 since version 240.
Ah right I already forgot... Thomas, so the only fix (or rather workaround) from libtirpc we can expect is the one that uses the soft limit ? Can you please provide the link where this was being discussed ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c66 --- Comment #66 from Franck Bui <fbui@suse.com> --- (In reply to Jaime Caamaño Ruiz from comment #62)
possible workaround is to use the build option provided by systemd Dbump-proc-sys-fs-nr-open, to avoid this bump and remain on the previous value, until we have a permanent fix in libtirpc. Frank, what do you think of this?
Yes that seems the only workaround we have currently to support systems running NIS (I'm wondering how much NIS is used these days though) since systemd upstream is not willing to move the call to initgroup because libtirpc has to be fixed anyway... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c67 Thomas Blume <thomas.blume@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(thomas.blume@suse | |.com) | --- Comment #67 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Franck Bui from comment #65)
(In reply to Jaime Caamaño Ruiz from comment #64)
I understand that PID1 calls initgroups(3) with it's own soft limit, which has increased from 1048576 to 1073741816 since version 240.
Ah right I already forgot...
Thomas, so the only fix (or rather workaround) from libtirpc we can expect is the one that uses the soft limit ?
Can you please provide the link where this was being discussed ?
Hi Franck, the discussion thread was here: https://sourceforge.net/p/libtirpc/mailman/message/36989375/ Jaime, are you ok to reassign the bug to you? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c68 --- Comment #68 from Franck Bui <fbui@suse.com> --- (In reply to Thomas Blume from comment #67)
https://sourceforge.net/p/libtirpc/mailman/message/36989375/
"At this point I would rather get NIS up and running now, as well as some other people seeing this problem. Patch coming." So the guy is not aware that this workaround won't work with systems running systemd (or any application that would raise its soft limit)... Anyway I'll disable the bump of /proc/sys/fs/nr_open as a temporary workaround although I fear it's going to stay forever. I already tested it one or two months ago and it did the trick. But someone should really try to push for a real fix in libtirpc, so please don't close this bug until the real work is done. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c69 --- Comment #69 from Jaime Caamaño Ruiz <jcaamano@suse.com> --- I submitted a patch with the linked list implementation [1] but I don't expect it to be merged any time soon in the best of cases. Franck should we close this bug on your call? BTW, do we have clear reproduction steps so I can try the patch out? [1] https://sourceforge.net/p/libtirpc/mailman/message/37024059/ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c70 --- Comment #70 from Jaime Caamaño Ruiz <jcaamano@suse.com> --- (In reply to Thomas Blume from comment #67)
Jaime, are you ok to reassign the bug to you?
I'm ok with that although maybe it is Franck who is going to really solve this. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c71 --- Comment #71 from Franck Bui <fbui@suse.com> --- (In reply to Jaime Caamaño Ruiz from comment #69)
I submitted a patch with the linked list implementation [1] but I don't expect it to be merged any time soon in the best of cases.
Great !
Franck should we close this bug on your call?
Nope, as I wrote the fix belong to libtirpc, so until this project is fixed this bug should stay opened. I simply submitted a workaround for convenience. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c72 --- Comment #72 from Luis Chamberlain <luis.chamberlain@suse.com> --- Is there a test case which can be written so that we can easily determine if a release has a proper fix merged? Does sytemd have a set of tests run prior to release? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c73 --- Comment #73 from Luis Chamberlain <luis.chamberlain@suse.com> --- Do we have a fix merged on Tumbleweed yet? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1165351 http://bugzilla.suse.com/show_bug.cgi?id=1165351#c74 --- Comment #74 from Luis Chamberlain <luis.chamberlain@suse.com> --- FWIW I'm on 5.6.14-1-default now and libtirpc3-1.2.6-1.14.x86_64 and I don't see a memory hog issue, however with NIS enabled and pam_systemd.so still enabled on /etc/pam.d/common-session-pc I see *huge* delays on login, even if its not a NIS login, *and* even on sudo su - -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1165351 https://bugzilla.suse.com/show_bug.cgi?id=1165351#c75 --- Comment #75 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #73)
Do we have a fix merged on Tumbleweed yet?
It was accepted to staging 4 days ago: https://build.opensuse.org/request/show/809898 Might take a few more days until it gets released. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1165351 https://bugzilla.suse.com/show_bug.cgi?id=1165351#c76 --- Comment #76 from Luis Chamberlain <luis.chamberlain@suse.com> --- (In reply to Franck Bui from comment #71)
(In reply to Jaime Caama�o Ruiz from comment #69)
I submitted a patch with the linked list implementation [1] but I don't expect it to be merged any time soon in the best of cases.
Great !
Franck should we close this bug on your call?
Nope, as I wrote the fix belong to libtirpc, so until this project is fixed this bug should stay opened.
I simply submitted a workaround for convenience.
Any luck with this being fixed properly upstream? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1165351 https://bugzilla.suse.com/show_bug.cgi?id=1165351#c77 Thomas Blume <thomas.blume@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(jcaamano@suse.com | |) --- Comment #77 from Thomas Blume <thomas.blume@suse.com> --- (In reply to Luis Chamberlain from comment #76)
(In reply to Franck Bui from comment #71)
(In reply to Jaime Caama�o Ruiz from comment #69)
I submitted a patch with the linked list implementation [1] but I don't expect it to be merged any time soon in the best of cases.
Great !
Franck should we close this bug on your call?
Nope, as I wrote the fix belong to libtirpc, so until this project is fixed this bug should stay opened.
I simply submitted a workaround for convenience.
Any luck with this being fixed properly upstream?
Yeah, that has been fixed in the libtirpc code. Jaime, can you take over? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1165351 https://bugzilla.suse.com/show_bug.cgi?id=1165351#c78 --- Comment #78 from Jaime Caama�o Ruiz <jcaamano@suse.com> --- Yes, this should be fixed in libtirpc 1.3.1 already included in tumbleweed. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1165351 Jaime Caama�o Ruiz <jcaamano@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jcaamano@suse.com | |) | -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1165351 https://bugzilla.suse.com/show_bug.cgi?id=1165351#c79 Thomas Blume <thomas.blume@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED --- Comment #79 from Thomas Blume <thomas.blume@suse.com> --- already fixed, closing -- You are receiving this mail because: You are on the CC list for the bug.
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com