What | Removed | Added |
---|---|---|
CC | asarai@suse.com |
For further context, the reason you have to set ip_forward to 0 is that setting ip_forward to its current value is a no-op. It appears that something is seriously broken inside the forwarding code, causing it to not forward packets properly (thus requiring a reset). I've been looking at the kernel side of this issue for quite a few days now, and though I still haven't found the cause of the issue I have discovered that the issue is not that packet forwarding is completely disabled -- the issue is that packet forwarding *from the host to the container* is broken. This can be fairly easily checked by running "dig @1.1.1.1 asdf.com" on a broken cluster -- if you packet capture the host you'll see the DNS packets leave the network and a reply is sent back to the host. However, the packets never get forwarded to the container. There actually is a coredns bug report which has comments that reference a similar issue[1] but I'm not convinced that it is actually related (not to mention the solution was "don't run coredns on your master node" which makes absolutely no sense). (As an aside, the reason my investigation of this has taken so long is because bpftrace has been fighting me every step of the way. The inability to do function_graph-style tracing, combined with endless silly restrictions on type conversions and the lack of BTF support in Tumbleweed kernels has been driving me up the wall.) [1]: https://github.com/coredns/coredns/issues/2284#issuecomment-605596767