New subject: [Bug 1202821] Containers lose permission to /dev/pts after some time

26 Aug 2022

      https://bugzilla.suse.com/show_bug.cgi?id=1202821

            Bug ID: 1202821
           Summary: Containers lose permission to /dev/pts after some time
    Classification: openSUSE
           Product: openSUSE Distribution
           Version: Leap 15.4
          Hardware: Other
                OS: Other
            Status: NEW
          Severity: Normal
          Priority: P5 - None
         Component: Containers
          Assignee: containers-bugowner@suse.de
          Reporter: fvogt@suse.com
        QA Contact: qa-bugs@suse.de
          Found By: ---
           Blocker: ---

On openqaworker1, there are four (pretty much identical) containers set up as
services which run openQA tests. After some uptime (~1 day?), xterm fails to
start due to permission issues. When it happens, those issues can be triggered
manually by executing e.g. su -P. After restarting a container, the permission
issues disappear for a while.

Sometimes a container has a different issue regarding cgroups, where it looks
like the assigned cgroup somehow disappeared.

Here's some example output showing the symptoms. containers 101 and 104 were
recently restarted and work fine.

openqaworker1:~ # podman exec -i openqaworker1_container_101 su -P
openqaworker1_container:/ # exit
openqaworker1:~ # podman exec -i openqaworker1_container_104 su -P
openqaworker1_container:/ # exit
openqaworker1:~ # podman exec -i openqaworker1_container_102 su -P
su: failed to create pseudo-terminal: Operation not permitted
openqaworker1:~ # podman exec -i openqaworker1_container_103 su -P
Error: exec failed: unable to start container process: error adding pid 10265
to cgroups: failed to write 10265: openat2
/sys/fs/cgroup/unified/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope/cgroup.procs:
no such file or directory: OCI runtime attempted to invoke a command that was
not found

I didn't know where to begin looking, so I started at the bottom and used
systemtap to trace where openat gets its error code from. I traced the
permission issue down to the devices cgroup (v1).

Some information about the relevant cgroups:

container 101 (restarted, works):

openqaworker1:~ # podman inspect openqaworker1_container_101 | grep CgroupPath
            "CgroupPath":
"/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope"
openqaworker1:~ # cat
/sys/fs/cgroup/devices/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope/devices.list
c 10:200 rwm
c 5:2 rwm
c 5:0 rwm
c 1:9 rwm
c 1:8 rwm
c 1:7 rwm
c 1:5 rwm
c 1:3 rwm
b *:* m
c *:* m
c 136:* rwm
openqaworker1:~ # podman inspect openqaworker1_container_101 | grep -i pid
            "Pid": 9426,
            "ConmonPid": 9413,
        "ConmonPidFile":
"/var/run/containers/storage/btrfs-containers/955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812/userdata/conmon.pid",
        "PidFile": "",
            "PidMode": "private",
            "PidsLimit": 2048,
openqaworker1:~ # cat /proc/9426/cgroup 
13:blkio:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
12:perf_event:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
11:devices:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
10:pids:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
9:memory:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
8:rdma:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
7:misc:/
6:cpuset:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
5:net_cls,net_prio:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
4:freezer:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
3:cpu,cpuacct:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
2:hugetlb:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
1:name=systemd:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
0::/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope

container 102 (permission issue, device entries missing):

openqaworker1:~ # podman inspect openqaworker1_container_102 | grep CgroupPath
            "CgroupPath":
"/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope"
openqaworker1:~ # cat
/sys/fs/cgroup/devices/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope/devices.list 
c 1:3 rwm
c 1:5 rwm
c 1:7 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:2 rwm
c 10:200 rwm
openqaworker1:~ # cat /proc/17525/cgroup
13:blkio:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
12:perf_event:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
11:devices:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
10:pids:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
9:memory:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
8:rdma:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
7:misc:/
6:cpuset:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
5:net_cls,net_prio:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
4:freezer:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
3:cpu,cpuacct:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
2:hugetlb:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
1:name=systemd:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
0::/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope

container 103 (cgroup missing?):

openqaworker1:~ # podman inspect openqaworker1_container_103 | grep -i pid
            "Pid": 18167,
            "ConmonPid": 18154,
        "ConmonPidFile":
"/var/run/containers/storage/btrfs-containers/6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0/userdata/conmon.pid",
        "PidFile": "",
            "PidMode": "private",
            "PidsLimit": 2048,
openqaworker1:~ # cat /proc/18167/cgroup 
13:blkio:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
12:perf_event:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
11:devices:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
10:pids:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
9:memory:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
8:rdma:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
7:misc:/
6:cpuset:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
5:net_cls,net_prio:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
4:freezer:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
3:cpu,cpuacct:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
2:hugetlb:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
1:name=systemd:/system.slice/container-openqaworker1_container_103.service
0::/system.slice/container-openqaworker1_container_103.service

It's visible that in the case of container 102, the device cgroup entries "b
*:* m" and "c *:* m" got removed
somehow and for container 103, the cgroup changed completely
(system.slice/*.service instead of machine.slice/libpod-*).

bug 1178775 sounds similar. I guess systemd is somehow interfering with podman?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 1202821] New: Containers lose permission to /dev/pts after some time

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

tags

participants (1)