Bug ID | 1165351 |
---|---|
Summary | systemd libpam-systemd immediate memory exhaustion triggers oom upon ssh |
Classification | openSUSE |
Product | openSUSE Tumbleweed |
Version | Current |
Hardware | Other |
OS | Other |
Status | NEW |
Severity | Major |
Priority | P5 - None |
Component | Basesystem |
Assignee | screening-team-bugs@suse.de |
Reporter | luis.chamberlain@suse.com |
QA Contact | qa-bugs@suse.de |
Found By | --- |
Blocker | --- |
I have a fresh install of OpenSUSE Tumbleweed on two HP ProLiant DL380 Gen9, its at version 20200222 right now both with 30 GiB of RAM using systemd-244-2.1.x86_64. On one of them I noticed that whenever I ssh in as a regular user but not root I see the out of memory killer triggered. That is.. 30 GiB of RAM eaten up. The system now has the debug boolean set on /etc/pam.d/common-session and sudo journalctl -f reveals: Feb 29 16:31:18 chivo sshd[16511]: Accepted publickey for mcgrof from 10.100.208.217 port 44016 ssh2: RSA SHA256:tOilLVI1jM/Pvg5uJVQbeXGEAU0mma6Jxma9c5eVX0w Feb 29 16:31:18 chivo sshd[16511]: pam_systemd(sshd:session): pam-systemd initializing Feb 29 16:31:18 chivo sshd[16511]: pam_systemd(sshd:session): Asking logind to create session: uid=1000 pid=16511 service=sshd type=tty class=user desktop= seat= vtnr=0 tty= display= remote=yes remote_user= remote_host=10.100.208.217 Feb 29 16:31:18 chivo sshd[16511]: pam_systemd(sshd:session): Session limits: memory_max=n/a tasks_max=n/a cpu_weight=n/a io_weight=n/a runtime_max_sec=n/a Feb 29 16:31:18 chivo systemd-logind[2184]: New session 62 of user mcgrof. Feb 29 16:31:18 chivo systemd[1]: Starting User Manager for UID 1000... -- <oom trace> --- Feb 29 16:31:34 chivo kernel: oom_reaper: reaped process 16513 ((systemd)), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Feb 29 16:31:34 chivo systemd[1]: user@1000.service: Failed with result 'protocol'. Feb 29 16:31:34 chivo systemd[1]: Failed to start User Manager for UID 1000. Feb 29 16:31:34 chivo systemd[1]: Started Session 62 of user mcgrof. Feb 29 16:31:34 chivo sshd[16511]: pam_systemd(sshd:session): Reply from logind: id=62 object_path=/org/freedesktop/login1/session/_362 runtime_path=/run/user/1000 session_fd=7 seat= vtnr=0 original_uid=1000 Feb 29 16:31:34 chivo sshd[16511]: pam_unix(sshd:session): session opened for user mcgrof(uid=1000) by (uid=0) Here's what a vmstat log looks like every second, first with the system on idle, then I create a new ssh session, it takes a while to eat up all the memory and then the oom takes over and kills the pam systemd service, and I am able to log in. The blank links are put in place below to highlight when the ssh session is just delaying and the oom is about to hit: mcgrof@chivo ~ $ sudo vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 248704 31981364 136 160708 2 248 35 252 31 63 0 1 99 0 0 0 0 248704 31981056 136 160856 0 0 0 0 131 101 0 0 100 0 0 0 0 248704 31981056 136 160856 0 0 0 0 82 92 0 0 100 0 0 0 0 248704 31980552 136 160824 52 0 384 0 108 157 0 0 100 0 0 0 0 248704 31980472 136 161196 0 0 0 232 164 280 0 0 100 0 0 0 0 248704 31980200 136 161344 128 0 228 0 155 240 0 0 100 0 0 0 0 248704 31980200 136 161412 0 0 0 0 160 106 0 0 100 0 0 0 0 248704 31979628 136 161484 0 0 0 0 95 108 0 0 100 0 0 0 0 248704 31979628 136 161484 0 0 0 0 114 146 0 0 100 0 0 0 0 248704 31979628 136 161484 0 0 0 0 79 84 0 0 100 0 0 1 0 248704 31932744 136 161484 0 0 16 0 120 133 0 0 100 0 0 0 0 248704 31929664 136 161504 0 0 0 0 116 104 0 0 100 0 0 0 0 248704 31929640 136 161508 0 0 0 0 125 162 0 0 100 0 0 1 0 248704 28031352 136 163092 416 0 1940 0 574 515 3 2 95 0 0 1 0 248704 22681916 136 163092 0 0 64 340 473 354 3 2 95 0 0 1 0 248704 17326052 136 163172 0 0 0 0 364 116 3 2 95 0 0 1 0 248704 11968392 136 163172 0 0 0 0 364 124 3 2 95 0 0 1 0 248704 6612528 136 163172 0 0 0 0 374 110 3 2 95 0 0 1 0 248704 1257168 136 163172 0 0 0 0 366 114 3 2 95 0 0 1 1 565632 215044 136 82292 192 316912 1864 316912 1678 1135 1 2 91 6 0 2 1 954816 207848 136 76672 264 389708 1564 389708 30002 12569 0 1 89 9 0 3 2 353004 24513544 136 78252 140 139752 8480 139752 38917 113910 0 16 83 1 0 0 0 350444 32073372 136 122312 3040 24 42184 68 4774 14383 0 1 98 1 0 0 0 350444 32073372 136 122344 8 0 24 0 134 159 0 0 100 0 0 Running $(sudo htop --sort-key PERCENT_MEM -d 1) reveals the same but in a much more fun way. FWIW it is using yp, and I so I have: /etc/yp.conf domain suse.de server <some-private-ip> domain suse.de server <some-other-ip> However leaving this blank, running $(netconfig update -f), and then ssh'ing in reveals it still triggers. Related public issues: https://github.com/systemd/systemd/issues/11502 https://bugzilla.redhat.com/show_bug.cgi?id=1666690 https://github.com/systemd/systemd/commit/084eeb865ca63887098e0945fb4e93c852b91b0f Red Hat's bug reveals it was a bad backport of the fix for CVE-2018-16864 and so that itself introduced another CVE, CVE-2019-3815. I can reproduce very thing with a simple shiny new user created as user named test. The user root does not trigger the issue. What is however odd is that this issue is not reproducible on the other twin system.