Puzzling rootless docker issue on TW
I've been attempting to help a user in the forums[1] with an issue running Docker rootless - he's running into an issue with being able to pull images resulting in the following error: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 10.0.2.3:53: read udp 10.0.2.100:48971->10.0.2.3:53: i/o timeout He also opened a discussion in the Docker forums (who referred him to our forums), and I've been chatting with the guy (Ákos) who was helping him over there in that thread[2]. The puzzling thing for me is that I can't duplicate the issue. I've tried on GNOME and KDE both, and I just cannot reproduce the problem. In troubleshooting the issue together with Ákos, we seem to have isolated the issue to something in the slirp4netns network that's used in that configuration - but nobody involved has customized that setup at all - yet it fails for both of them and doesn't fail for me. The issue *seems* to be that the DNS server set up in that network (10.0.2.3) is not responsive for either of them - but it is for me. Wondering if anyone here has any ideas about what else to look at beyond what we already have in these two threads. I'm completely out of ideas. Jim [1] https://forums.opensuse.org/t/rootless-docker-i-o-timeout-with-docker- pull/169468/12 [2] https://forums.docker.com/t/rootless-docker-i-o-timeout-with-docker- pull/137848 -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
On 9/29/23 14:19, Jim Henderson wrote:
I've been attempting to help a user in the forums[1] with an issue running Docker rootless - he's running into an issue with being able to pull images resulting in the following error:
Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 10.0.2.3:53: read udp 10.0.2.100:48971->10.0.2.3:53: i/o timeout
He also opened a discussion in the Docker forums (who referred him to our forums), and I've been chatting with the guy (Ákos) who was helping him over there in that thread[2].
The puzzling thing for me is that I can't duplicate the issue. I've tried on GNOME and KDE both, and I just cannot reproduce the problem.
In troubleshooting the issue together with Ákos, we seem to have isolated the issue to something in the slirp4netns network that's used in that configuration - but nobody involved has customized that setup at all - yet it fails for both of them and doesn't fail for me.
The issue *seems* to be that the DNS server set up in that network (10.0.2.3) is not responsive for either of them - but it is for me.
Wondering if anyone here has any ideas about what else to look at beyond what we already have in these two threads. I'm completely out of ideas.
Jim
[1] https://forums.opensuse.org/t/rootless-docker-i-o-timeout-with-docker- pull/169468/12 [2] https://forums.docker.com/t/rootless-docker-i-o-timeout-with-docker- pull/137848
Hi Jim, Could there be a firewall blocking it ? -- Regards, Joe
On Fri, 29 Sep 2023 14:45:05 -0400, Joe Salmeri wrote:
Could there be a firewall blocking it ?
Actually, this got me to thinking - in the host there isn't any firewall configuration blocking it, but in the network namespace provided by slirp4netns, that might be a different issue. I've no idea where that iptables firewall configuration comes from, but it definitely is different in my setup than the host (which makes sense to me). What the failures are showing for the two who are seeing them is an ICMP response failure from the gateway in the rootless network namespace to the tap0 interface that's defined in the namespace (ie, 10.0.2.2 is the gateway, 10.0.2.100 is the tap0 interface within the namespace). It seems almost certain to be something within this virtual userspace- defined network. -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
Jim, Do the following docker ps get the hash for the container, or look in the logs for the hash, then do the following docker logs --follow <hashid or if you are name for the docker, then use docker name> This will give you all the out of the docker container since you started to run it . On Fri, Sep 29, 2023 at 3:42 PM Jim Henderson <hendersj@gmail.com> wrote:
On Fri, 29 Sep 2023 14:45:05 -0400, Joe Salmeri wrote:
Could there be a firewall blocking it ?
Actually, this got me to thinking - in the host there isn't any firewall configuration blocking it, but in the network namespace provided by slirp4netns, that might be a different issue. I've no idea where that iptables firewall configuration comes from, but it definitely is different in my setup than the host (which makes sense to me).
What the failures are showing for the two who are seeing them is an ICMP response failure from the gateway in the rootless network namespace to the tap0 interface that's defined in the namespace (ie, 10.0.2.2 is the gateway, 10.0.2.100 is the tap0 interface within the namespace).
It seems almost certain to be something within this virtual userspace- defined network.
-- Jim Henderson Please keep on-topic replies on the list so everyone benefits
-- Terror PUP a.k.a Chuck "PUP" Payne ----------------------------------------- Discover it! Enjoy it! Share it! openSUSE Linux. ----------------------------------------- openSUSE -- Terrorpup openSUSE Ambassador/openSUSE Member skype,twiiter,identica,friendfeed -- terrorpup freenode(irc) --terrorpup/lupinstein Register Linux Userid: 155363 openSUSE Community Member since 2008.
On Fri, 29 Sep 2023 16:17:47 -0400, Chuck Payne wrote:
Do the following
docker ps
get the hash for the container, or look in the logs for the hash, then do the following
docker logs --follow <hashid or if you are name for the docker, then use docker name>
This will give you all the out of the docker container since you started to run it .
Thanks, Chuck - unfortunately, the issue happens before the container image is even pulled down. Rootless docker uses a userspace network layer that seems to interfere in some (but not all instances) with even pulling the image to start with. We're looking at network configuration inside that userspace stuff using the 'nsenter' command to look at a tap interface that sits between the rootless docker daemon and the outside world. -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
On Fri, Sep 29, 2023 at 5:45 PM Jim Henderson <hendersj@gmail.com> wrote:
On Fri, 29 Sep 2023 16:17:47 -0400, Chuck Payne wrote:
Do the following
docker ps
get the hash for the container, or look in the logs for the hash, then do the following
docker logs --follow <hashid or if you are name for the docker, then use docker name>
This will give you all the out of the docker container since you started to run it .
Thanks, Chuck - unfortunately, the issue happens before the container image is even pulled down. Rootless docker uses a userspace network layer that seems to interfere in some (but not all instances) with even pulling the image to start with.
We're looking at network configuration inside that userspace stuff using the 'nsenter' command to look at a tap interface that sits between the rootless docker daemon and the outside world.
-- Jim Henderson Please keep on-topic replies on the list so everyone benefits
Then maybe a wireshark which you are trying to pull it to see what errors might be there. Good Luck. -- Terror PUP a.k.a Chuck "PUP" Payne ----------------------------------------- Discover it! Enjoy it! Share it! openSUSE Linux. ----------------------------------------- openSUSE -- Terrorpup openSUSE Ambassador/openSUSE Member skype,twiiter,identica,friendfeed -- terrorpup freenode(irc) --terrorpup/lupinstein Register Linux Userid: 155363 openSUSE Community Member since 2008.
On Fri, 29 Sep 2023 19:56:30 -0400, Chuck Payne wrote:
Then maybe a wireshark which you are trying to pull it to see what errors might be there. Good Luck.
Yep, we did that as well - you can see the traces in the thread on the Docker forums. The issue is that inside the userspace network space, for some reason, ICMP responses are not able to be returned. We can't figure out why that is. -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
On 2023-09-29 22:08, Jim Henderson wrote:
On Fri, 29 Sep 2023 19:56:30 -0400, Chuck Payne wrote:
Then maybe a wireshark which you are trying to pull it to see what errors might be there. Good Luck. Yep, we did that as well - you can see the traces in the thread on the Docker forums. The issue is that inside the userspace network space, for some reason, ICMP responses are not able to be returned. We can't figure out why that is.
Hi Jim, Interestingly enough, I have a similar issue with my laptop. It may not be the same but it sounds strikingly similar. ::: TL;DR ::: In your case, can your user try to replicate the problem in a VM, using bridged networking? ::: Details ::: At times, I cannot access/ping one box on my LAN from my laptop. When the issue crops up, the packets seem to make it to my box (confirmed via tcpdump) but ping doesn't seem them. As if they're blocked. Even with the firewalld disabled. I can definitely access this box if my wireless NIC is the only one enabled (not the wired too). Which to me, it sounded like a routing issue. As I've been busy, and it's not critical to access this one box, I haven't spent too many cycles on it. Earlier in the week, I spun up a new TW VM on my laptop. I set up bridge mode networking with my wired NIC. The VM can access that other box! I thought, aha, perhaps the firewall rules are borked: # fgrep FirewallBackend firewalld.conf # FirewallBackend FirewallBackend=nftables From the VM, I dumped the rules and slurped them into my host OS but it didn't make a difference. Here's what I did in case it helps your situation: VM # nft list ruleset > good.rules # nft -f good.rules Next on my list was to try and figure out which firewall packages are installed during TW install and re-install them. See if I can replicate my VM's environment. Thx! -pablo
On Sat, 30 Sep 2023 06:18:49 -0400, Pablo Sanchez wrote:
Hi Jim,
Interestingly enough, I have a similar issue with my laptop. It may not be the same but it sounds strikingly similar.
::: TL;DR :::
In your case, can your user try to replicate the problem in a VM, using bridged networking?
I can ask the user who's having the issue; I've not been able to reproduce the issue at all, but the user in the thread on the openSUSE forums (tilfischer) as well as the individual helping in the Docker forums (rimelek) can. The former is running on bare metal, the latter is running TW inside an lxd VM. I've been testing in VMware Workstation 17.0.2. All are on the 20230926 release of TW. Rimelek's installation uses a pre- built VM image from the lxd repositories, but both mine and tilfisher's were installed from media using default options (his was KDE, mine was GNOME, but I also tried KDE) and then updated with zypper dup. I feel I need to emphasize that it's not a physical network issue, but a virtual network issue in how rootless docker works. I'm going to explain this in more detail, partly to help clarify the issue for those reading this thread, and partly to help me make sure I understand what I'm seeing. Rootless docker is a way to run the docker daemon as a user, without root privileges. In order to connect to the network, there's a userspace network tool used that creates what appears to be a virtual routed network. This is configured automatically by the /usr/bin/dockerd- rootless-setuptool.sh script. What you end up with is a network configuration that looks like this: host <----> userspace network <----> docker networks That "userspace network" is only present inside the host - it's a tap interface that has its own subnet (10.0.2.x), and is configured with its own routes and iptables firewall rules: --- snip --- localhost:/home/jhenderson # iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy DROP) target prot opt source destination DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere localhost:/home/jhenderson # --- snip --- From the host's perspective, it doesn't exist as a network interface at all: --- snip --- jhenderson@localhost:~> ifconfig ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.16.251.134 netmask 255.255.255.0 broadcast 172.16.251.255 inet6 fe80::20c:29ff:fee2:f877 prefixlen 64 scopeid 0x20<link> ether 00:0c:29:e2:f8:77 txqueuelen 1000 (Ethernet) RX packets 13433 bytes 19185557 (18.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1359 bytes 136840 (133.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 274 bytes 25324 (24.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 274 bytes 25324 (24.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 --- snip --- It sits solely between the Docker daemon (owned by the user rather than as root) and the outside world, in order to allow the user to run docker as a non-root user. The issue here isn't with networking within docker containers in this configuration (though I suspect we'd find that it probably does interfere with that as well), but with the docker daemon not even being able to pull images from the repo. The userspace network can only be directly accessed by using the 'nsenter' command, run as root (which is how I got the routing table and iptables output for that 10.0.2.x network). It also provides its own DNS forwarder, configured on 10.0.2.3. The address 10.0.2.100 is assigned to the tap0 interface inside the space. What is happening for those for whom it is failing is that they run 'docker pull' within their environment. The request goes to the docker daemon, which then attempts to do a lookup of registry-1.docker.io. That request is sent from the daemon to the DNS forwarder using the userspace network (ie, from 10.0.2.100 to 10.0.2.3), and a response never comes back. In those that fail, they are seeing the default gateway (10.0.2.2) reporting that there is no route to the "host" address of 10.0.2.100 (it's an ICMP response packet that's reporting that, as seen in the wireshark traces run inside the userspace network). The problem with this is that 10.0.2.2 and 10.0.2.100 are on the same virtual network; a route isn't needed for one to connect to the other, so the error "no route to host" doesn't make any sense at all. If, on my real network (which uses 172.16.0.x), I ping from 172.16.0.170 to 172.16.0.42, and .42 isn't responsive, I get an ICMP timeout, not an ICMP "no route to host" message. If .42 filters ICMP requests, I get a timeout. But if .42 responds and .170 were somehow configured to ignore/reject/drop those response packets, I'm not sure what you'd see - in fact, thinking about it, I'm not sure how you'd configure a firewall to reject ICMP *responses*. In researching this, I've found that the troubleshooting documentation for slirp4netns (the userspace network tool) appears to be virtually nonexistent - but the fact that at least myself and tilfischer are using default installations and getting different results means that we're missing something different in our configurations. I'm just at a complete loss as to what that is. Jim -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
OK, so I've found that in the lxd image, it fails as with rimelek's setup. So I've been able to reliably reproduce it. I've also got it configured on the host that runs the lxd image, and it works there. From there I've been able to determine that the traffic is never leaving the userspace network. Running wireshark both inside the userspace network and outside it, I see the requests inside the userspace network, and no traffic on the host's network at all. What I was hoping to see was a DNS lookup request and response, followed by nothing - but the DNS request isn't even getting out. When I do the trace on the host (where it works for me), I see traffic on the host's external network. So it seems that the issue is that traffic isn't passing from the userspace network to the real-world network. -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
On Sat, 30 Sep 2023 21:23:26 -0000 (UTC), Jim Henderson wrote:
OK, so I've found that in the lxd image, it fails as with rimelek's setup. So I've been able to reliably reproduce it.
I've also got it configured on the host that runs the lxd image, and it works there.
From there I've been able to determine that the traffic is never leaving the userspace network. Running wireshark both inside the userspace network and outside it, I see the requests inside the userspace network, and no traffic on the host's network at all.
What I was hoping to see was a DNS lookup request and response, followed by nothing - but the DNS request isn't even getting out.
When I do the trace on the host (where it works for me), I see traffic on the host's external network.
So it seems that the issue is that traffic isn't passing from the userspace network to the real-world network.
The user who reported the issue has figured it out. Because we generate a symlink for /etc/resolv.conf rather than a real file, slirp4netns doesn't work. The documentation for it specifically states that the file has to be a real file, not a symlink. Changing it to a real file resolved the issue for him. He's going to report this through bugzilla so a permanent fix can be implemented. His post on his resolution can be found at [1]. [1] https://forums.docker.com/t/rootless-docker-i-o-timeout-with-docker- pull/137848/29 -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
participants (4)
-
Chuck Payne
-
Jim Henderson
-
Joe Salmeri
-
Pablo Sanchez