[opensuse-kubic] DNS resolution not working

17 Jul 2019

      Hi,

I have set up a minimal Kubic cluster using KVM VMs. I have been using
the 

  openSUSE-MicroOS.x86_64-16.0.0-Kubic-kubeadm-kvm-and-xen-Build3.60.qcow2

base image. 

$ kubectl get nodes
NAME       STATUS   ROLES    AGE   VERSION
master     Ready    master   16m   v1.15.0
worker-1   Ready    <none>   13m   v1.15.0

I've used the 'standard' steps to set up the cluster

- kubeadm init --pod-network-cidr=10.244.0.0/16 on master
- set up ~/.kube/config on master
- kubectl apply -f /usr/share/k8s-yaml/flannel/kube-flannel.yaml on
master
- kubeadm join on worker-1

I noticed that DNS resolution does not work for cluster-local services,
which is quite problematic.

The basic test I am running is

$ kubectl apply -f https://k8s.io/examples/admin/dns/busybox.yaml
$ time kubectl exec -ti busybox -- nslookup kubernetes.default

This always fails within a minute

Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

real	1m0.203s
user	0m0.100s
sys	0m0.025s

I have started looking at the usual suspects:

1. dns pods are up

$ kubectl -n kube-system get pods -ls
NAME                       READY   STATUS    RESTARTS   AGE
coredns-5c98db65d4-r8ck8   1/1     Running   0          17m
coredns-5c98db65d4-wpjc4   1/1     Running   0          17m

2. The DNS service is available 

$ kubectl -n kube-system describe svc
Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=KubeDNS
Annotations:       prometheus.io/port: 9153
                   prometheus.io/scrape: true
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.96.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         10.244.0.2:53,10.244.0.3:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         10.244.0.2:53,10.244.0.3:53
Port:              metrics  9153/TCP
TargetPort:        9153/TCP
Endpoints:         10.244.0.2:9153,10.244.0.3:9153
Session Affinity:  None
Events:            <none>

3. I noticed that trying to run queries against the endpoints
themselves fails as well, e.g.

$ kubectl -n default exec -ti busybox -- nslookup kubernetes.default 10.244.0.2

4. As the coredns pods are always scheduled on master, I tried to ping
the address from the worker-1 node, but that does not work

master:~ # ip r
default via 10.24.0.1 dev eth0 proto dhcp 
10.24.0.0/16 dev eth0 proto kernel scope link src 10.24.0.70 
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 

worker-1:~ # ip r
default via 10.24.0.1 dev eth0 proto dhcp 
10.24.0.0/16 dev eth0 proto kernel scope link src 10.24.0.71 
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1

5. I have tried to use cilium, but then the pod is stuck in
'ContainerCreating'.

I'm a bit at a loss here to be honest, any ideas about debugging this
are more than welcome. Things usually work, I can access the internet
from containers, the VMs can ping each other - it's just the 'virtual'
IP access that does not work.

Thanks,

Robert

-- 
To unsubscribe, e-mail: opensuse-kubic+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse-kubic+owner@opensuse.org

Robert Munteanu

Thorsten Kukuk

Antonio Ojea

Antonio Ojea

tags

participants (3)