[Bug 1165502] systemd warnings: Failed to create session: Start job for unit user-0.slice failed with 'canceled'
http://bugzilla.suse.com/show_bug.cgi?id=1165502 http://bugzilla.suse.com/show_bug.cgi?id=1165502#c6 --- Comment #6 from Archie Cobbs <archie.cobbs@gmail.com> --- (In reply to Franck Bui from comment #5)
(In reply to Archie Cobbs from comment #4)
The warning only happens occasionally. I haven't been able to get it to happen by trying manually on the command line yet.
So what did you try to show in comment #2 ?
Sorry for the confusion. What happens is that Nagios is running on another machine and it periodically logs into this machine via SSH to perform various checks. The only check that seems to be causing the error is the JVM check that runs sudo (see below for entire script). All of the actual error cases I've shown are just pulled from /var/log/warn. These all occurred when I wasn't watching. Most of the time the Nagios checks do not generate the warning, but occasionally they do and these instances are captured in /var/log/warn. I've not yet been able to make the problem occur in "real-time" from the console.
Note each time 'pexpnagios' logs in it may be doing a different Nagios check. The Nagios check that generates the errors seems to always be one
Can you tell us more about how 'pexpnagios' was created ? Can you share the relevant entry in /etc/passwd for this user for example ?
From /etc/passwd:
pexpnagios:x:1000:100::/home/pexpnagios:/bin/bash
From /etc/shadow:
pexpnagios:*:18183:0:99999:7::: More info: $ cd /home/pexpnagios/ $ tree -a . ├── .bash_history ├── .bashrc ├── bin ├── .config ├── .fonts ├── .local │ └── share │ └── systemd │ └── user -> ../../../.config/systemd/user ├── logwarn │ ├── pcom.logwarn │ └── syslog.logwarn ├── .profile └── .ssh └── authorized_keys 8 directories, 7 files $ cat .local/share/systemd/user cat: .local/share/systemd/user: No such file or directory $ cat .ssh/authorized_keys # $Id: authorized_keys 8797 2018-09-13 19:02:55Z archie $ no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,no-user-rc,from="127.0.0.1,::1" ssh-rsa AAAA[...public key elided...] pexpnagios@ops I have not idea what that ".local/share/systemd/user" file is or how it got there.
This JVM Nagios check script runs "sudo /usr/bin/jstat" twice in rapid succession:
Can you tell us more about the Nagios plugin you're using and also share (by attaching) the check script ?
See below.
BTW did you make any modification in the pam stuff ?
I hope not. But to digress a bit, often when I do a "zypper dup" to upgrade to a newer version of openSUSE, there are RPM config file conflicts in /etc/pam.d. This is really annoying and it's never clear how to resolve these. It appears that two things that don't know about each other are conflicting: (a) the /etc/pam.d/common-foo files are normal files owned by the "pam" RPM, but (b) the "pam-config" RPM turns them into symlinks. So any time the "pam" RPM is upgraded and changes any of the common-foo files, you get an RPM config file conflict. Anyway, don't know if that has anything to do with this. Here's the contents of those file: $ cd /etc/pam.d/ $ ls -l total 148 -rw-r--r-- 1 root root 167 Sep 5 2019 chage -rw-r--r-- 1 root root 199 Sep 5 2019 chfn -rw-r--r-- 1 root root 199 Sep 5 2019 chpasswd -rw-r--r-- 1 root root 199 Sep 5 2019 chsh lrwxrwxrwx 1 root root 17 May 23 2016 common-account -> common-account-pc -rw-r--r-- 1 root root 392 Oct 25 2015 common-account.pam-config-backup -rw-r--r-- 1 root root 451 Mar 2 16:53 common-account-pc lrwxrwxrwx 1 root root 14 May 23 2016 common-auth -> common-auth-pc -rw-r--r-- 1 root root 462 Oct 25 2015 common-auth.pam-config-backup -rw-r--r-- 1 root root 536 Mar 2 16:53 common-auth-pc lrwxrwxrwx 1 root root 18 May 23 2016 common-password -> common-password-pc -rw-r--r-- 1 root root 510 Oct 25 2015 common-password.pam-config-backup -rw-r--r-- 1 root root 422 Mar 2 16:53 common-password-pc lrwxrwxrwx 1 root root 17 May 23 2016 common-session -> common-session-pc -rw-r--r-- 1 root root 482 Oct 25 2015 common-session.pam-config-backup -rw-r--r-- 1 root root 547 Mar 2 16:53 common-session-pc -rw-r--r-- 1 root root 481 Dec 4 05:03 crond -rw-r--r-- 1 root root 172 Sep 5 2019 groupadd -rw-r--r-- 1 root root 172 Sep 5 2019 groupdel -rw-r--r-- 1 root root 172 Sep 5 2019 groupmod -rw-r--r-- 1 root root 370 Sep 5 2019 login -rw-r--r-- 1 root root 172 Sep 5 2019 newusers -rw-r--r-- 1 root root 251 Apr 27 2019 other -rw-r--r-- 1 root root 133 Sep 5 2019 passwd -rw-r--r-- 1 root root 165 Jul 30 2019 polkit-1 -rw-r--r-- 1 root root 336 Jul 10 2019 pure-ftpd -rw-r--r-- 1 root root 492 Sep 5 2019 remote -rw-r--r-- 1 root root 274 Sep 5 2019 runuser -rw-r--r-- 1 root root 280 Sep 5 2019 runuser-l -rw-r--r-- 1 root root 137 Apr 27 2019 screen -rw-r--r-- 1 root root 165 Oct 22 05:13 smtp -rw-r--r-- 1 root root 404 Dec 9 11:05 sshd -rw-r--r-- 1 root root 277 Sep 5 2019 su -rw-r--r-- 1 root root 249 Feb 19 03:10 sudo -rw-r--r-- 1 root root 255 Feb 19 03:10 sudo-i -rw-r--r-- 1 root root 329 Sep 5 2019 su-l -rw-r--r-- 1 root root 220 Feb 25 10:01 systemd-user -rw-r--r-- 1 root root 172 Sep 5 2019 useradd -rw-r--r-- 1 root root 172 Sep 5 2019 userdel -rw-r--r-- 1 root root 172 Sep 5 2019 usermod -rw-r--r-- 1 root root 164 Mar 5 2019 vlock [root@test.stv.pexp] 834 cat common-account #%PAM-1.0 # # This file is autogenerated by pam-config. All changes # will be overwritten. # # Account-related modules common to all services # # This file is included from other service-specific PAM config files, # and should contain a list of the account modules that define # the central access policy for use on the system. The default is to # only deny service to users whose accounts are expired. # account required pam_unix.so try_first_pass [root@test.stv.pexp] 835 cat common-auth #%PAM-1.0 # # This file is autogenerated by pam-config. All changes # will be overwritten. # # Authentication-related modules common to all services # # This file is included from other service-specific PAM config files, # and should contain a list of the authentication modules that define # the central authentication scheme for use on the system # (e.g., /etc/shadow, LDAP, Kerberos, etc.). The default is to use the # traditional Unix authentication mechanisms. # auth required pam_env.so auth required pam_unix.so try_first_pass [root@test.stv.pexp] 836 cat common-password #%PAM-1.0 # # This file is autogenerated by pam-config. All changes # will be overwritten. # # Password-related modules common to all services # # This file is included from other service-specific PAM config files, # and should contain a list of modules that define the services to be # used to change user passwords. # password requisite pam_cracklib.so password required pam_unix.so use_authtok nullok try_first_pass [root@test.stv.pexp] 837 cat common-session #%PAM-1.0 # # This file is autogenerated by pam-config. All changes # will be overwritten. # # Session-related modules common to all services # # This file is included from other service-specific PAM config files, # and should contain a list of modules that define tasks to be performed # at the start and end of sessions of *any* kind (both interactive and # non-interactive # session optional pam_systemd.so session required pam_limits.so session required pam_unix.so try_first_pass session optional pam_umask.so session optional pam_env.so This is our homebrew check_jvm Nagios check: ======== cut here ============ cat /usr/lib/nagios/plugins/check_jvm #!/bin/sh # # Checks JVM stats # # Set constants SUWRAPPER="/usr/bin/sudo" JSTAT="${SUWRAPPER} /usr/bin/jstat" # Slurp in common functions . /usr/lib/nagios/plugins/check_functions # Usage message usage() { echo "Usage:" 1>&2 echo " ${NAME} [-e] [-o] [-p]" 1>&2 echo "Options:" 1>&2 echo " -e Warn if eden space > this percentage" 1>&2 echo " -o Warn when old gen space > this percentage" 1>&2 echo " -p Warn when perm gen space > this percentage" 1>&2 echo " -g Warn when GC time exceeds threshold" 1>&2 echo " -u Warn when heap utilization > this percentage" 1>&2 echo " -h Display this help message" 1>&2 } # Check variables EDEN_CHECK= OLDG_CHECK= PRMG_CHECK= GCTH_CHECK= HPTH_CHECK= PID=`pidof java` # Parse flags passed in on the command line while [ ${#} -gt 0 ]; do case "$1" in -e|--eden) shift EDEN_CHECK="$1" shift ;; -o|--old) shift OLDG_CHECK="$1" shift ;; -p|--perm) shift PRMG_CHECK="$1" shift ;; -g|--gc) shift GCTH_CHECK="$1" shift ;; -u|--heap) shift HPTH_CHECK="$1" shift ;; -h|--help) usage exit ;; --) shift break ;; *) break ;; esac done case "${#}" in 0) break ;; *) usage exit 1 ;; esac # Sanity check if [ -z "${PID}" ]; then nagios_exit OK "No java process found" fi # Check heap utilization, heap capacity, and GC time JSTAT_GC=`${JSTAT} -gc -t ${PID} \ | tail -1 | sed "s/^\s*//g"` JSTAT_HP=`${JSTAT} -gccapacity -t ${PID} \ | tail -1 | sed "s/^\s*//g"` # Check Eden EC=`echo "${JSTAT_GC}" | awk '{ printf("%.0f", $6) }'` EU=`echo "${JSTAT_GC}" | awk '{ printf("%.0f", $7) }'` EP=$(printf '%i %i' $EU $EC | awk '{ pc=100*$1/$2; i=int(pc); print (pc-i<0.5)?i:i+1 }') if [ -n "${EDEN_CHECK}" ] && [ "${EP}" -gt ${EDEN_CHECK} ] ; then nagios_exit WARNING "Eden space at ${EP}%" fi # Check Old OC=`echo "${JSTAT_GC}" | awk '{ printf("%.0f", $8) }'` OU=`echo "${JSTAT_GC}" | awk '{ printf("%.0f", $9) }'` OP=$(printf '%i %i' $OU $OC | awk '{ pc=100*$1/$2; i=int(pc); print (pc-i<0.5)?i:i+1 }') if [ -n "${OLDG_CHECK}" ] && [ "${OP}" -gt "${OLDG_CHECK}" ] ; then nagios_exit WARNING "Old generation space at ${OP}%" fi # Check Perm PC=`echo "${JSTAT_GC}" | awk '{ printf("%.0f", $10) }'` PU=`echo "${JSTAT_GC}" | awk '{ printf("%.0f", $11) }'` PP=$(printf '%i %i' $PU $PC | awk '{ pc=100*$1/$2; i=int(pc); print (pc-i<0.5)?i:i+1 }') if [ -n "${PRMG_CHECK}" ] && [ "${PP}" -gt "${PRMG_CHECK}" ] ; then nagios_exit WARNING "Perm generation space at ${PP}%" fi # Check GC time JVMT=`echo "${JSTAT_GC}" | awk '{ print $1 }'` GCTM=`echo "${JSTAT_GC}" | awk '{ print $16 }'` GCTS=`echo "scale=20; ${GCTM} / 1000" | bc` DIFF=`echo "scale=20; ${GCTS} / ${JVMT}" | bc` if [ -n "${GCTH_CHECK}" ] && [ `echo "scale=20; ${DIFF} > ${GCTH_CHECK}" | bc` -eq 1 ]; then nagios_exit WARNING "Time spent in GC has exceeded $(echo "scale=3; ${GCTH_CHECK} * 100" | bc)% of time since JVM started" fi # Check heap utilization EGUT=`echo "${JSTAT_GC}" | awk '{ print $7 }'` EGCP=`echo "${JSTAT_HP}" | awk '{ print $3 }'` OGUT=`echo "${JSTAT_GC}" | awk '{ print $9 }'` OGCP=`echo "${JSTAT_HP}" | awk '{ print $9 }'` HPTU=`echo "scale=20; ${EGUT} + ${OGUT}" | bc` HPTC=`echo "scale=20; ${EGCP} + ${OGCP}" | bc` DIFF=`echo "scale=20; ${HPTU} / ${HPTC}" | bc` if [ -n "${HPTH_CHECK}" ] && [ `echo "scale=20; ${DIFF} > ${HPTH_CHECK}" | bc` -eq 1 ]; then nagios_exit WARNING "Heap utilization at $(echo "scale=3; ${DIFF} * 100" | bc | awk '{printf "%f", $0}')% of max capacity" fi # Done nagios_exit OK -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com