[Bug 1215636] New: virtlxcd dying constantly killing all containers with it
https://bugzilla.suse.com/show_bug.cgi?id=1215636 Bug ID: 1215636 Summary: virtlxcd dying constantly killing all containers with it Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Major Priority: P5 - None Component: Virtualization:Tools Assignee: virt-bugs@suse.de Reporter: m.szczepaniak.000@gmail.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Hello, since couple of updates I've been having issue with lxd containers in libvirt/virtmanager. At random times, previously once per day now even more often, they're being killed for no reason. No significant errors in logs whatsoever. While investigating i came across virtlxcd and noticed it has 2 hour timeout, and every time the service dies it takes all containers with it. I don't know why it has timeout, but I think it's intentional what I don't think is intentional is killing all containers so please advise -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 Charles Arnold <carnold@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|virt-bugs@suse.de |jfehlig@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c1 James Fehlig <jfehlig@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |m.szczepaniak.000@gmail.com Flags| |needinfo?(m.szczepaniak.000 | |@gmail.com) --- Comment #1 from James Fehlig <jfehlig@suse.com> --- (In reply to Michał Szczepaniak from comment #0)
Hello, since couple of updates I've been having issue with lxd containers in libvirt/virtmanager. At random times, previously once per day now even more often, they're being killed for no reason. No significant errors in logs whatsoever. While investigating i came across virtlxcd and noticed it has 2 hour timeout, and every time the service dies it takes all containers with it. I don't know why it has timeout, but I think it's intentional what I don't think is intentional is killing all containers so please advise
The default timeout is 2 minutes, not 2 hours. Regardless, virtlxcd should not terminate when it is managing active "VMs". The timeout should be inhibited in that case. I've stared at the inhibition code for quite some time and it looks correct. I'll need to reproduce the issue myself and poke around with gdb. In the meantime, you can override the timeout with 'systemctl edit --full virtlxcd.service`, and remove the '--timeout 120' from VIRTLXCD_ARGS. This will prevent the daemon from terminating. Can you check if your containers are fine when virtlxcd is started without the timeout option? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c2 --- Comment #2 from Michał Szczepaniak <m.szczepaniak.000@gmail.com> --- Ah 2 minutes yeah I've noticed that even if it was 2h it not always kill the containers. I will try without the timeout sure. Stil no change and containers die like once per day sometimes couple times per day -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c3 --- Comment #3 from Michał Szczepaniak <m.szczepaniak.000@gmail.com> --- Another information (because there's never enough information) I have in cmdline splash=silent quiet elevator=noop cgroup_enable=memory systemd.unified_cgroup_hierarchy=0 isolcpus=1,2,3,4,5,7,8,9,10,11 mitigations=auto I'm including this because previously I had systemd.unified_cgroup_hierarchy=1 and recently i had to switch it off and enable the memory cgroup or the containers wouldn't start so maybe there are other cgroups i need to enable? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c4 --- Comment #4 from James Fehlig <jfehlig@suse.com> --- (In reply to Michał Szczepaniak from comment #3)
cgroup_enable=memory systemd.unified_cgroup_hierarchy=0 isolcpus=1,2,3,4,5,7,8,9,10,11
I don't have any of these in the kernel command line of my Tumbleweed host.
I'm including this because previously I had systemd.unified_cgroup_hierarchy=1 and recently i had to switch it off and enable the memory cgroup or the containers wouldn't start so maybe there are other cgroups i need to enable?
Interesting. Can you start your containers if you remove the above kernel options, but enable DefaultMemoryAccounting as described in the following bug comment? https://bugzilla.suse.com/show_bug.cgi?id=1214845#c7 BTW, I reproduced virtlxcd terminating after 2 minutes even with a running container. I'll need to investigate further. However, in my case the container did continue to run, although it's quite simple <domain type='lxc'> <name>vm1</name> <memory>500000</memory> <os> <type>exe</type> <init>/bin/sh</init> </os> <vcpu>1</vcpu> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/libexec/libvirt_lxc</emulator> <interface type='network'> <source network='default'/> </interface> <console type='pty' /> </devices> </domain> -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c5 --- Comment #5 from Michał Szczepaniak <m.szczepaniak.000@gmail.com> --- even with virtlxcd terminating every 2 minutes the containers keep being alive but only die like once per day also if i run systemctl restart virtlxcd it kills all containers immediately. Don't know if it should but just reporting I will try the cmdline thing -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c6 --- Comment #6 from Michał Szczepaniak <m.szczepaniak.000@gmail.com> --- Created attachment 869801 --> https://bugzilla.suse.com/attachment.cgi?id=869801&action=edit log from when containers died In case this helps here's log from journalctl i caught right when the containers died -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c7 --- Comment #7 from Michał Szczepaniak <m.szczepaniak.000@gmail.com> --- Another interesting thing is that they seem to die when i connected to the libvirit via virtmanager i'm connecting from different host via ssh. But also of course it doesn't happen every time -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c8 --- Comment #8 from James Fehlig <jfehlig@suse.com> --- Have you tried removing the cgroup_enable, systemd.unified_cgroup_hierarchy, and isolcpus kernel parameters, and overriding DefaultMemoryAccounting=no as I suggested #4? Let me know if you have any questions about that. (In reply to Michał Szczepaniak from comment #6)
Created attachment 869801 [details] log from when containers died
From this log it appears virtlxcd has crashed. Do you see any coredumps via 'coredumpctl list virtlxcd'? If so, provide the crashing stack trace with 'coredumpctl info virtlxcd'. (In reply to Michał Szczepaniak from comment #5)
also if i run systemctl restart virtlxcd it kills all containers immediately. Don't know if it should but just reporting
Hmm, I don't see this behavior. My test containers continue running fine across virtlxcd restarts. I'll leave the containers running over the weekend and see if they mysteriously disappear as you've seen. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c9 --- Comment #9 from Michał Szczepaniak <m.szczepaniak.000@gmail.com> --- I will be trying it today, sorry i couldn't try it ealier because i broke my backups and had to resend everything which is like 3 days process -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c10 --- Comment #10 from Michał Szczepaniak <m.szczepaniak.000@gmail.com> --- so far with the DefaultMemoryAccounting=yes it hasn't crashed, nor through night, nor when i'm connecting nor when i'm restarting so it's very promising -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c11 --- Comment #11 from Michał Szczepaniak <m.szczepaniak.000@gmail.com> --- Yeah i think it's solved, thanks for help! anything i should do with DefaultMemoryAccounting? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c12 James Fehlig <jfehlig@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(m.szczepaniak.000 | |@gmail.com) | Status|NEW |RESOLVED Resolution|--- |WORKSFORME --- Comment #12 from James Fehlig <jfehlig@suse.com> --- (In reply to Michał Szczepaniak from comment #11)
Yeah i think it's solved, thanks for help! anything i should do with DefaultMemoryAccounting?
I'm not sure what you mean? The libvirt lxc driver expects the memory controller to be available under /sys/fs/cgroup/machine.slice/, which requires overriding DefaultMemoryAccounting=no in /usr/lib/systemd/system.conf.d/__20-defaults-SUSE.conf. See https://bugzilla.suse.com/show_bug.cgi?id=1214845#c7 on how to do that. I'm going to close this bug for now with status 'resolved -> worksforme'. We didn't really fix anything, only adjusted configuration. Thanks for reporting the issue and the timely responses! -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c13 --- Comment #13 from Michał Szczepaniak <m.szczepaniak.000@gmail.com> --- I was more talking about not modifying files in /usr and more permanent config location :D -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c14 --- Comment #14 from James Fehlig <jfehlig@suse.com> --- (In reply to Michał Szczepaniak from comment #13)
I was more talking about not modifying files in /usr and more permanent config location :D
Described here https://bugzilla.suse.com/show_bug.cgi?id=1214845#c7 :-) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215636 https://bugzilla.suse.com/show_bug.cgi?id=1215636#c15 --- Comment #15 from Michał Szczepaniak <m.szczepaniak.000@gmail.com> --- Oki thanks a ton for help! Tho I might be back with another issue :P -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com