[opensuse-kubic] is After=network-online.target the best option for container systemd units?
Hi team, I find myself asking the question mentioned in $SUBJ. All of our container-systemd reference units currently start "After=network-online.target" which is also what I've been using for my own services. I have a simple znc IRC bouncer container based on the busybox container https://build.opensuse.org/package/show/home:RBrownSUSE:containers/znc-image using a systemd unit file similar to those in containers-systemd. It's been misbehaving lately after the VM it is hosted on restarts. The /etc/resolv.conf somehow gets populated with google DNS servers (eg. 8.8.8.8) instead of the hosts DNS servers. If I restart the container after booting, the /etc/resolv.conf in the container is correct. If I change the znc unit to start "After=multi-user.target" the container always has the correct /etc/resolv.conf. This implies that there is some kind of race condition with the container starting before the hosts /etc/resolv.conf is being set properly. I suspect this is probably a side effect of wicked being a little..inconsistant with how it informs systemd of it's readiness, but whatever the root cause, I find myself wondering if "After=network- online.target" is the best good practice for containers anyway. I imagine multi-user.target is probably the right value if anyones wanting to run rootless containers for example.. What do you all think? -- Richard Brown Linux Distribution Engineer - Future Technology Team Phone +4991174053-361 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D-90409 Nuernberg (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer -- To unsubscribe, e-mail: opensuse-kubic+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kubic+owner@opensuse.org
On Tue, Aug 11, Richard Brown wrote:
All of our container-systemd reference units currently start "After=network-online.target" which is also what I've been using for my own services.
Yes, we changed that in January since else no Container got ever updated. The pull request always failed as the network was not yet up and thus we always started the old container image. But maybe there is also a bug: the docu states you need additional "Wants: network-online.target". Not sure if this really helps.
If I change the znc unit to start "After=multi-user.target" the container always has the correct /etc/resolv.conf.
This implies that there is some kind of race condition with the container starting before the hosts /etc/resolv.conf is being set properly.
Maybe we need both? But I think in your case this only works by accident, multi-user.target is done late enough after network-online.target, thus everything has settled. If network-online.target is reached later than multi-user.target
I suspect this is probably a side effect of wicked being a little..inconsistant with how it informs systemd of it's readiness, but whatever the root cause, I find myself wondering if "After=network- online.target" is the best good practice for containers anyway.
I imagine multi-user.target is probably the right value if anyones wanting to run rootless containers for example..
What do you all think?
Let's fix wicked :( Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & MicroOS SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany Managing Director: Felix Imendoerffer (HRB 36809, AG Nürnberg) -- To unsubscribe, e-mail: opensuse-kubic+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kubic+owner@opensuse.org
On Tue, 2020-08-11 at 16:18 +0200, Thorsten Kukuk wrote:
Let's fix wicked :(
Yeah looks like we really have to - lightning last night messed up my MicroOS host at home and now it's perfectly exhibiting the same problem - meaning it cant pull containers on start because it cant DNS resolve registry.opensuse.org until after its booted. At least it's reproducable on different hardware :( -- Richard Brown Linux Distribution Engineer - Future Technology Team Phone +4991174053-361 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D-90409 Nuernberg (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer -- To unsubscribe, e-mail: opensuse-kubic+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kubic+owner@opensuse.org
Richard Brown wrote:
[...] This implies that there is some kind of race condition with the container starting before the hosts /etc/resolv.conf is being set properly.
I suspect this is probably a side effect of wicked being a little..inconsistant with how it informs systemd of it's readiness, but whatever the root cause, I find myself wondering if "After=network- online.target" is the best good practice for containers anyway.
Wicked is a bit fubar¹. However, even if it was fixed, relying on network-online.target would still be a hack. Also copying resolv.conf as one shot operation seems old school. A better way would be to use eg systemd-resolved or dnsmasq on 127.0.0.1 and make the containers talk to that. That way containers could start up ASAP during boot (just like host services do) and adjust to network changes dynamically. cu Ludwig [1] https://bugzilla.suse.com/show_bug.cgi?id=1172684, https://jira.suse.com/browse/PM-1982 -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-kubic+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kubic+owner@opensuse.org
On Fri, 2020-08-14 at 11:02 +0200, Ludwig Nussel wrote:
Richard Brown wrote:
[...] This implies that there is some kind of race condition with the container starting before the hosts /etc/resolv.conf is being set properly.
I suspect this is probably a side effect of wicked being a little..inconsistant with how it informs systemd of it's readiness, but whatever the root cause, I find myself wondering if "After=network- online.target" is the best good practice for containers anyway.
Wicked is a bit fubar¹. However, even if it was fixed, relying on network-online.target would still be a hack. Also copying resolv.conf as one shot operation seems old school. A better way would be to use eg systemd-resolved or dnsmasq on 127.0.0.1 and make the containers talk to that. That way containers could start up ASAP during boot (just like host services do) and adjust to network changes dynamically.
I think you're right about the resolv.conf copy oneshot, but even if we fixed it we'd still have systems like mine that cant even download the container in the first place because it cant resolve the registry domains yet... I don't think putting systemd in every container for resolved is a sensible option. So fixing wicked or working around it are probably the two viable ways out of these ugly warts. -- Richard Brown Linux Distribution Engineer - Future Technology Team Phone +4991174053-361 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D-90409 Nuernberg (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer -- To unsubscribe, e-mail: opensuse-kubic+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kubic+owner@opensuse.org
Richard Brown wrote:
On Fri, 2020-08-14 at 11:02 +0200, Ludwig Nussel wrote:
Richard Brown wrote:
[...] This implies that there is some kind of race condition with the container starting before the hosts /etc/resolv.conf is being set properly.
I suspect this is probably a side effect of wicked being a little..inconsistant with how it informs systemd of it's readiness, but whatever the root cause, I find myself wondering if "After=network- online.target" is the best good practice for containers anyway.
Wicked is a bit fubar¹. However, even if it was fixed, relying on network-online.target would still be a hack. Also copying resolv.conf as one shot operation seems old school. A better way would be to use eg systemd-resolved or dnsmasq on 127.0.0.1 and make the containers talk to that. That way containers could start up ASAP during boot (just like host services do) and adjust to network changes dynamically.
I think you're right about the resolv.conf copy oneshot, but even if we fixed it we'd still have systems like mine that cant even download the container in the first place because it cant resolve the registry domains yet...
The service doing the download has to be aware of network events too then.
I don't think putting systemd in every container for resolved is a sensible option.
resolvd would only have to run on the host with resolv.conf inside the container pointing to the host. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-kubic+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kubic+owner@opensuse.org
participants (3)
-
Ludwig Nussel
-
Richard Brown
-
Thorsten Kukuk