Comment # 2 on bug 1202821 from Fabian Vogt

(In reply to Michal Koutn��� from comment #1)
> (In reply to Fabian Vogt from comment #0)
> > On openqaworker1, there are four (pretty much identical) containers set up
> > as services which run openQA tests.
> 
> What does it mean containers as services? (I assume it's from project [1],
> right?)

"podman generate-systemd" in this case.

> Q1) Is there difference how the containers are started among the four? (I.e.
> from user session vs from a systemd service.)

All of them use practically identical systemd units and FWICT it's random
which of the containers fails in which way...

> > Sometimes a container has a different issue regarding cgroups, where it
> > looks like the assigned cgroup somehow disappeared.
> 
> What do you mean here? Is it the system.slice vs machine.slice discrepancy?
> Or anything else?

That the cgroup assigned to the container (resp. the other way around)
disappeared, resulting in ENOENT.

> > openqaworker1:~ # podman exec -i openqaworker1_container_102 su -P
> > su: failed to create pseudo-terminal: Operation not permitted
> 
> This looks suspiciously similar to the bug 1178775, if it weren't Leap 15.4
> with systemd v249 (where this should be fixed).
>
> > openqaworker1:~ # podman exec -i openqaworker1_container_103 su -P
> > Error: exec failed: unable to start container process: error adding pid
> > 10265 to cgroups: failed to write 10265: openat2
> 
> If it was process 10625 terminated earlier than it could have been migrated
> to scope cgroup, we'd get ESRCH. This is ENOENT, so the cgroup doesn't
> exist. So the *c1c0.scope doesn't exist from systemd's PoV.
> 
> > It's visible that in the case of container 102, the device cgroup entries "b
> > *:* m" and "c *:* m" got removed
> > somehow and for container 103, the cgroup changed completely
> > (system.slice/*.service instead of machine.slice/libpod-*).
> 
> Q2) What cgroup "driver" does podman use for these containers?
> (cgroupManager in podman lingo, cgroupfs vs systemd.)

All of them have "CgroupManager": "systemd"

> > bug 1178775 sounds similar. I guess systemd is somehow interfering with
> > podman?
> 
> Or podman is interfering with systemd. :-)
> 
> 1) 
> 
> Q3) Would it be too bold to ask to switch the host to the unified mode
> (system.cgroup_unified_hierarchy=1 to kernel cmdline)?
>   (Issues with device controller and maintaining parallel hierarchies with
> systemd (and container runtime) would likely be gone with just the unified
> hierarchy.)

I tried that (without typo, "systemd.unified_cgroup_hierarchy=1"). The kernel
parameter is used, but it looks like cgroupv1 is still used by systemd, at
least
for devices. Is that expected?

openqaworker1:~ # cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-5.14.21-150400.24.18-default
root=UUID=ff1922d2-d2e4-4860-9634-acac681dd0f9 resume=/dev/md0 nospec
console=tty0 console=ttyS1,115200n
resume=/dev/disk/by-uuid/10264dd9-ba3d-4ef1-8db9-0d74df0d43f1 splash=silent
quiet showopts nospec spectre_v2=off pti=off systemd.cgroup_unified_hierarchy=1
openqaworker1:~ # ls /sys/fs/cgroup/devices/*.slice/
/sys/fs/cgroup/devices/machine.slice/:
cgroup.clone_children  cgroup.procs  devices.allow  devices.deny  devices.list 
notify_on_release  tasks

/sys/fs/cgroup/devices/openqa.slice/:
cgroup.clone_children  cgroup.procs  devices.allow  devices.deny  devices.list 
notify_on_release  tasks

/sys/fs/cgroup/devices/system.slice/:
auditd.service                                 cron.service       
mdmonitor.service                          postfix.service   
systemd-journald.service      wickedd-auto4.service
boot-grub2-i386\x2dpc.mount                    dbus.service       
notify_on_release                          rebootmgr.service 
systemd-logind.service        wickedd-dhcp4.service
boot-grub2-x86_64\x2defi.mount                 devices.allow       nscd.service
                              root.mount         systemd-udevd.service        
wickedd-dhcp6.service
cgroup.clone_children                          devices.deny       
openqa-worker-cacheservice-minion.service  rsyslog.service   
system-getty.slice            wickedd-nanny.service
cgroup.procs                                   devices.list       
openqa-worker-cacheservice.service         smartd.service    
system-modprobe.slice         wickedd.service
chronyd.service                                firewalld.service   opt.mount   
                              srv.mount          system-serial\x2dgetty.slice 
\x2esnapshots.mount
container-openqaworker1_container_101.service  haveged.service    
os-autoinst-openvswitch.service            sshd.service       tasks
container-openqaworker1_container_102.service  home.mount         
ovsdb-server.service                       sysroot-etc.mount  tmp.mount
container-openqaworker1_container_103.service  irqbalance.service 
ovs-vswitchd.service                       sysroot.mount      usr-local.mount
container-openqaworker1_container_104.service  mcelog.service     
polkit.service                             sysroot-var.mount 
var-lib-openqa.mount

/sys/fs/cgroup/devices/user.slice/:
cgroup.clone_children  cgroup.procs  devices.allow  devices.deny  devices.list 
notify_on_release  tasks

Also, this is not the default, so IMO even if it works with unified it should
still be fixed with cgv1 or the default changed.

> [1] https://github.com/openSUSE/containers-systemd.