[Bug 1171770] New: Worker nodes can't acces kube-api

newer
[Bug 1163206] hwmon/k10temp: Add...

bugzilla_noreply＠suse.com

16 May 2020 16 May '20

12:36

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 Bug ID: 1171770 Summary: Worker nodes can't acces kube-api Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: 64bit OS: Linux Status: NEW Severity: Major Priority: P5 - None Component: Kubic Assignee: kubic-bugs@opensuse.org Reporter: contact@ffreitas.io QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Release : 20200514 Deployment method : ''' kubicctl init kubicctl node add worker01.local ''' After this deployment, none of my pods deployed on worker nodes can access the kube-api. For example with kured I get : ''' time="2020-05-14T22:18:03Z" level=info msg="Kubernetes Reboot Daemon: 1.3.0" time="2020-05-14T22:18:03Z" level=info msg="Node ID: worker01" time="2020-05-14T22:18:03Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock" time="2020-05-14T22:18:03Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s" time="2020-05-14T22:18:03Z" level=info msg="Blocking Pod Selectors: []" time="2020-05-14T22:18:03Z" level=info msg="Reboot on: SunMonTueWedThuFriSat between 00:00 and 23:59 UTC" time="2020-05-14T22:18:33Z" level=fatal msg="Error testing lock: Get https://10.96.0.1:443/apis/apps/v1/namespaces/kube-system/daemonsets/kured: dial tcp 10.96.0.1:443: i/o timeout" ''' I found a similar issue on reddit : https://www.reddit.com/r/kubernetes/comments/gjhxcj/fresh_kubeadm_install_po... -- You are receiving this mail because: You are on the CC list for the bug.

Attachments:

attachment.htm (text/html — 3.5 KB)

Show replies by date

bugzilla_noreply＠suse.com

16 May 16 May

20:54

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c1 Francisco Freitas <contact@ffreitas.io> changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|Major |Critical --- Comment #1 from Francisco Freitas <contact@ffreitas.io> --- After a day of testing I am seing no way of having a working kubic installation from the latest ISO. -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

21:00

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c2 Quentin Onno <contact@qonno.fr> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |contact@qonno.fr --- Comment #2 from Quentin Onno <contact@qonno.fr> --- Hi, Same issue here Regards, -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

17 May 17 May

10:17

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c4 --- Comment #4 from Francisco Freitas <contact@ffreitas.io> --- (In reply to Thorsten Kukuk from comment #3)

...

kured 1.3.0 is not the latest build, that contains kured 1.4.0.

I'm not sure where the problem is, you can try to use flannel instead of weave for the pod network. But flannel is not really maintained anymore and reports a lot of iptables errors. DNS isn't fully working, too.

Destroyed my cluster. Done a transactional-update. Same issue with the latest kured version : ''' time="2020-05-17T09:53:27Z" level=info msg="Kubernetes Reboot Daemon: 1.4.0" time="2020-05-17T09:53:27Z" level=info msg="Node ID: worker01" time="2020-05-17T09:53:27Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock" time="2020-05-17T09:53:27Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s" time="2020-05-17T09:53:27Z" level=info msg="Blocking Pod Selectors: []" time="2020-05-17T09:53:27Z" level=info msg="Reboot on: SunMonTueWedThuFriSat between 00:00 and 23:59 UTC" ''' It does not come from kured. I got the same issue with multiple services (for example haproxy-ingress). For the CNI I tested : - cilium (build a yaml for the github repository and put it in /usr/shared/k8s-yaml/cilium) - weavenet (default init) - flannel (kubicctl init --pod-network flannel) Same issue with all of them. -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

12:31

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c6 --- Comment #6 from Francisco Freitas <contact@ffreitas.io> --- (In reply to Thorsten Kukuk from comment #5)

...

With flannel the error is for me much seldom than with weave. But it looks like the best way to find out if the cluster is affected or not is: run a busybox container and use nslookup to resolve a host. On an affected cluster you will run into a timeout (temporary failure in name resolution), else you should get immediately a response.

A second kubernetes cluster is running fine for me without the issues.

Between, kured is also broken since the last systemd update is incompatible ...

Again, not a kured issue for me as it affects other services. What is the configuration on your unaffected cluster ? Is it a fresh install ? -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

16:38

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c8 --- Comment #8 from Francisco Freitas <contact@ffreitas.io> --- (In reply to Thorsten Kukuk from comment #7)

...

(In reply to Francisco Freitas from comment #6)

...
Again, not a kured issue for me as it affects other services.

kured cannot reboot the system anymore since systemd moved binaries, so you are affected by this.

The issue I want to point to here is the timeout to 10.96.0.1 wich is the kube-api service. Kured is just an example I took. I've seen the issue you're talking about. Still not the one I'm hoping to solve

...

...
What is the configuration on your unaffected cluster ?

Is it a fresh install ?

It's a multi-master setup, but no fresh install, only always updated. So not really compareable.

I can't rollback here. I must start a new environment. -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

18 May 18 May

17:34

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c9 Richard Brown <rbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS Assignee|kubic-bugs@opensuse.org |rbrown@suse.com --- Comment #9 from Richard Brown <rbrown@suse.com> --- Hi all - I've been looking at this all day, here is the current status: I can confirm it happens with both kubicctl and kubeadm clusters made from the current snapshots. We know this doesn't occur on kubicctl clusters with multi-masters, which suggests haproxy somehow works around the issue. Recent snapshots have had the following recent changes which I suspect could be related (listed in invasiveness according to my opinion): - busybox package reworking - kernel update from 5.6.11 to 5.6.12 - minor runc patch - kured There is also the possibility that the cause is something else, I'm at a loss to be honest and trying to debug this just by going on what few clues we have here - any more data points and examples from people would be greatly appreciated. Given the bug report shows the problem occurs with earlier kured versions and with services other than kured, I think it's safe to rule out that update. Given the busybox package fundamentally changed every image we use for kubernetes, that was my first suspicion, so today I've built all of the images based on the busybox-free Tumbleweed base image (which is much larger, but obviously more likely to have everything each k8s component requires) You can get these images from registry.opensuse.org/home/rbrownsuse/branches/devel/kubic/containers/container/kubic However, as anyone who wishes to help can see, a cluster created with "kubeadm init --image-repository registry.opensuse.org/home/rbrownsuse/branches/devel/kubic/containers/container/kubic" still demonstrates this bug with a vengeance. I've even tried using the heavyweight base containers with the weave image, to no difference. So I'm pretty convinced our images/busybox are not at fault. This now leads me to wonder if the kernel or runc updates are to blame, which I will look at tomorrow, unless someone beats me to it first. Sorry that this doesn't look like it will be a quick fix. Anyone got any other info that might help? -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

17:42

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c10 --- Comment #10 from Francisco Freitas <contact@ffreitas.io> --- (In reply to Richard Brown from comment #9)

...

Hi all - I've been looking at this all day, here is the current status:

I can confirm it happens with both kubicctl and kubeadm clusters made from the current snapshots.

We know this doesn't occur on kubicctl clusters with multi-masters, which suggests haproxy somehow works around the issue.

Might want to verify this. Tried a multi-master deployment two releases back. I had no issue with the master nodes but I still got the issue on the worker nodes. (Will test it on the latest release again tonight).

...

This now leads me to wonder if the kernel or runc updates are to blame, which I will look at tomorrow, unless someone beats me to it first.

Sorry that this doesn't look like it will be a quick fix. Anyone got any other info that might help?

Couldn't it be tested by downgrading the kernel ? -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

17:44

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c11 --- Comment #11 from Richard Brown <rbrown@suse.com> --- (In reply to Francisco Freitas from comment #10)

...

(In reply to Richard Brown from comment #9)

...
Hi all - I've been looking at this all day, here is the current status:

I can confirm it happens with both kubicctl and kubeadm clusters made from the current snapshots.

We know this doesn't occur on kubicctl clusters with multi-masters, which suggests haproxy somehow works around the issue.

Might want to verify this. Tried a multi-master deployment two releases back. I had no issue with the master nodes but I still got the issue on the worker nodes. (Will test it on the latest release again tonight).

...
This now leads me to wonder if the kernel or runc updates are to blame, which I will look at tomorrow, unless someone beats me to it first.

Sorry that this doesn't look like it will be a quick fix. Anyone got any other info that might help?

Couldn't it be tested by downgrading the kernel ?

Sure but a) from where? and b) I've worked enough today, I think I'd like a bit of a break before picking this up tomorrow ;) -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

18:01

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c12 --- Comment #12 from Francisco Freitas <contact@ffreitas.io> --- (In reply to Richard Brown from comment #11)

...

(In reply to Francisco Freitas from comment #10)

...
(In reply to Richard Brown from comment #9)

...
Hi all - I've been looking at this all day, here is the current status:

I can confirm it happens with both kubicctl and kubeadm clusters made from the current snapshots.

We know this doesn't occur on kubicctl clusters with multi-masters, which suggests haproxy somehow works around the issue.

Might want to verify this. Tried a multi-master deployment two releases back. I had no issue with the master nodes but I still got the issue on the worker nodes. (Will test it on the latest release again tonight).

...
This now leads me to wonder if the kernel or runc updates are to blame, which I will look at tomorrow, unless someone beats me to it first.

Sorry that this doesn't look like it will be a quick fix. Anyone got any other info that might help?

Couldn't it be tested by downgrading the kernel ?

Sure but a) from where?

It was just a genuine question. I remember using the tumbleweed-cli to access the history repositories. I do not know if it can be done with kubic.

...

and b) I've worked enough today, I think I'd like a bit of a break before picking this up tomorrow ;)

I was not hoping for you to work on this single issue again today :p -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

21:41

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c13 --- Comment #13 from Francisco Freitas <contact@ffreitas.io> --- The error is also present on multi-master cluster. I deployed a cluster from the release 20200516 using the following commands : ``` kubicctl init --haproxy loadbalancer --multi-master loadbalancer.cluster.local kubicctl node add --type master master02 kubicctl node add --type master master03 kubicctl node add worker01 ``` -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

19 May 19 May

13:03

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c14 --- Comment #14 from Richard Brown <rbrown@suse.com> --- (In reply to Francisco Freitas from comment #13)

...

The error is also present on multi-master cluster. I deployed a cluster from the release 20200516 using the following commands :

``` kubicctl init --haproxy loadbalancer --multi-master loadbalancer.cluster.local kubicctl node add --type master master02 kubicctl node add --type master master03 kubicctl node add worker01 ```

So.. I've used kubeadm init --image-repository to use only upstream containers - problem still occurs I've used rebuilt kubernetes 1.18.2 containers - problem still occurs i've deployed it on kubernetes 1.17.5 - problem still occurs I've used only upstream weave, cilium and other CNI providers - problem still occurs I've used https://download.opensuse.org/history/ to move my nodes to every version of kubic we've had in May - problem still occurs I'm officially flummoxed - does anyone have any idea when this last worked for sure? because I'm running out of things to rule out -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

13:14

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c15 --- Comment #15 from Francisco Freitas <contact@ffreitas.io> --- (In reply to Richard Brown from comment #14)

...

(In reply to Francisco Freitas from comment #13)

...
The error is also present on multi-master cluster. I deployed a cluster from the release 20200516 using the following commands :

``` kubicctl init --haproxy loadbalancer --multi-master loadbalancer.cluster.local kubicctl node add --type master master02 kubicctl node add --type master master03 kubicctl node add worker01 ```

So.. I've used kubeadm init --image-repository to use only upstream containers - problem still occurs I've used rebuilt kubernetes 1.18.2 containers - problem still occurs i've deployed it on kubernetes 1.17.5 - problem still occurs I've used only upstream weave, cilium and other CNI providers - problem still occurs I've used https://download.opensuse.org/history/ to move my nodes to every version of kubic we've had in May - problem still occurs

I'm officially flummoxed - does anyone have any idea when this last worked for sure? because I'm running out of things to rule out

Last time I successfully installed a Kubic cluster was on april 7th with the following configuration : - upstream cilium for the CNI - single master - release 20200405 updated from a 20200108 iso -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

14:53

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c16 --- Comment #16 from Richard Brown <rbrown@suse.com> --- (In reply to Francisco Freitas from comment #15)

...

(In reply to Richard Brown from comment #14)

...
(In reply to Francisco Freitas from comment #13)

...
The error is also present on multi-master cluster. I deployed a cluster from the release 20200516 using the following commands :

``` kubicctl init --haproxy loadbalancer --multi-master loadbalancer.cluster.local kubicctl node add --type master master02 kubicctl node add --type master master03 kubicctl node add worker01 ```

So.. I've used kubeadm init --image-repository to use only upstream containers - problem still occurs I've used rebuilt kubernetes 1.18.2 containers - problem still occurs i've deployed it on kubernetes 1.17.5 - problem still occurs I've used only upstream weave, cilium and other CNI providers - problem still occurs I've used https://download.opensuse.org/history/ to move my nodes to every version of kubic we've had in May - problem still occurs

I'm officially flummoxed - does anyone have any idea when this last worked for sure? because I'm running out of things to rule out

Last time I successfully installed a Kubic cluster was on april 7th with the following configuration : - upstream cilium for the CNI - single master - release 20200405 updated from a 20200108 iso

Do you (or anyone else) have aN iso that old somewhere I can download it to see if I can narrow this down further? -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

15:09

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c17 --- Comment #17 from Francisco Freitas <contact@ffreitas.io> --- (In reply to Richard Brown from comment #16)

...

(In reply to Francisco Freitas from comment #15)

...
(In reply to Richard Brown from comment #14)

...
(In reply to Francisco Freitas from comment #13)

...
The error is also present on multi-master cluster. I deployed a cluster from the release 20200516 using the following commands :

``` kubicctl init --haproxy loadbalancer --multi-master loadbalancer.cluster.local kubicctl node add --type master master02 kubicctl node add --type master master03 kubicctl node add worker01 ```

So.. I've used kubeadm init --image-repository to use only upstream containers - problem still occurs I've used rebuilt kubernetes 1.18.2 containers - problem still occurs i've deployed it on kubernetes 1.17.5 - problem still occurs I've used only upstream weave, cilium and other CNI providers - problem still occurs I've used https://download.opensuse.org/history/ to move my nodes to every version of kubic we've had in May - problem still occurs

I'm officially flummoxed - does anyone have any idea when this last worked for sure? because I'm running out of things to rule out

Last time I successfully installed a Kubic cluster was on april 7th with the following configuration : - upstream cilium for the CNI - single master - release 20200405 updated from a 20200108 iso

Do you (or anyone else) have aN iso that old somewhere I can download it to see if I can narrow this down further?

I only got the 20200108 iso. Will it do the trick for you ? -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

16:28

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c18 --- Comment #18 from Francisco Freitas <contact@ffreitas.io> --- (In reply to Richard Brown from comment #16)

...

(In reply to Francisco Freitas from comment #15)

...
(In reply to Richard Brown from comment #14)

...
(In reply to Francisco Freitas from comment #13)

...
The error is also present on multi-master cluster. I deployed a cluster from the release 20200516 using the following commands :

``` kubicctl init --haproxy loadbalancer --multi-master loadbalancer.cluster.local kubicctl node add --type master master02 kubicctl node add --type master master03 kubicctl node add worker01 ```

So.. I've used kubeadm init --image-repository to use only upstream containers - problem still occurs I've used rebuilt kubernetes 1.18.2 containers - problem still occurs i've deployed it on kubernetes 1.17.5 - problem still occurs I've used only upstream weave, cilium and other CNI providers - problem still occurs I've used https://download.opensuse.org/history/ to move my nodes to every version of kubic we've had in May - problem still occurs

I'm officially flummoxed - does anyone have any idea when this last worked for sure? because I'm running out of things to rule out

Last time I successfully installed a Kubic cluster was on april 7th with the following configuration : - upstream cilium for the CNI - single master - release 20200405 updated from a 20200108 iso

Do you (or anyone else) have aN iso that old somewhere I can download it to see if I can narrow this down further?

In case you need it I've managed to upload it here : https://send.firefox.com/download/a4f0c1b25d2d81a9/#72p-GCmfwurTQhAPLPwEsw -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

16:29

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c19 --- Comment #19 from Richard Brown <rbrown@suse.com> --- No need for an ISO, found what I believe to be the trigger for the problem. WORKAROUND: Delete /etc/sysctl.d/70-yast.conf If cluster is already bootstrapped, reboot all nodes. Cluster communications work properly afterwards. NEXT STEP: Figure out why the heck /etc/sysctl.d/70-yast.conf's blocking of IP forwarding is taking an effect when /usr/lib/sysctl.d/90-yast.conf should be overriding it :) -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

16:30

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c20 --- Comment #20 from Richard Brown <rbrown@suse.com> --- (In reply to Richard Brown from comment #19)

...

Figure out why the heck /etc/sysctl.d/70-yast.conf's blocking of IP forwarding is taking an effect when /usr/lib/sysctl.d/90-yast.conf should be overriding it :)

Correction.. /usr/lib/sysctl.d/90-kubeadm.conf is what should be overriding 70-yast.conf -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

16:39

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c21 --- Comment #21 from Francisco Freitas <contact@ffreitas.io> --- (In reply to Richard Brown from comment #19)

...

No need for an ISO, found what I believe to be the trigger for the problem.

WORKAROUND:

Delete /etc/sysctl.d/70-yast.conf

If cluster is already bootstrapped, reboot all nodes. Cluster communications work properly afterwards.

NEXT STEP:

Figure out why the heck /etc/sysctl.d/70-yast.conf's blocking of IP forwarding is taking an effect when /usr/lib/sysctl.d/90-yast.conf should be overriding it :)

Nice ! I will try it out tonight. Thanks for the workaround. -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

17:59

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c22 Rafael Fernández López <rfernandezlopez@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rfernandezlopez@suse.com --- Comment #22 from Rafael Fernández López <rfernandezlopez@suse.com> --- I have the impression that `net.ipv6.conf.all.forwarding = 0` being set by `/etc/sysctl.d/70-yast.conf` has an impact here. I did override this setting by creating a `/etc/sysctl.d/91-kubeadm.conf` file with contents: ``` net.ipv4.ip_forward = 1 net.ipv6.conf.all.forwarding = 1 ``` After rebooting the node, everything works fine. As Richard mentioned, removing `/etc/sysctl.d/70-yast.conf` altogether and rebooting also makes the trick. This makes me think that the override in `/usr/lib/sysctl.d/90-kubeadm.conf` is not enough, it currently has: ``` # The file is provided as part of the kubernetes-kubeadm package net.ipv4.ip_forward = 1 ```

...

From what I see, it should include `net.ipv6.conf.all.forwarding = 1` as well. I cannot explain why this is happening in a better way right now though.

-- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

18:50

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c23 --- Comment #23 from Rafael Fernández López <rfernandezlopez@suse.com> --- As a note, rebooting is not strictly necessary. `sysctl -w -a --system` should work as well. But I have run over funny behaviors over `sysctl` in the past (specially when mixed with values in `/etc/sysctl.conf`). This is why I recommend to directly reboot to ensure that everything is fine even after rebooting. -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

20 May 20 May

07:14

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c24 --- Comment #24 from Francisco Freitas <contact@ffreitas.io> --- Tested the workaround on a multi-master cluster with cilium. Did not have the issue again on any of my services. -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

07:59

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c25 --- Comment #25 from Richard Brown <rbrown@suse.com> --- (In reply to Rafael Fernández López from comment #22)

...

I have the impression that `net.ipv6.conf.all.forwarding = 0` being set by `/etc/sysctl.d/70-yast.conf` has an impact here.

I did override this setting by creating a `/etc/sysctl.d/91-kubeadm.conf` file with contents:

``` net.ipv4.ip_forward = 1 net.ipv6.conf.all.forwarding = 1 ```

After rebooting the node, everything works fine. As Richard mentioned, removing `/etc/sysctl.d/70-yast.conf` altogether and rebooting also makes the trick.

This makes me think that the override in `/usr/lib/sysctl.d/90-kubeadm.conf` is not enough, it currently has:

``` # The file is provided as part of the kubernetes-kubeadm package net.ipv4.ip_forward = 1 ```

From what I see, it should include `net.ipv6.conf.all.forwarding = 1` as well. I cannot explain why this is happening in a better way right now though.

I tried this before making my post, and it didn't work for me..but I trust your observation also so I'm putting it in a patch for kubernetes1.18 and kubernetes1.17 and testing those packages :) thanks! -- You are receiving this mail because: You are on the CC list for the bug.

bugzilla_noreply＠suse.com

09:14

New subject: [Bug 1171770] Worker nodes can't acces kube-api

http://bugzilla.opensuse.org/show_bug.cgi?id=1171770 http://bugzilla.opensuse.org/show_bug.cgi?id=1171770#c26 --- Comment #26 from Richard Brown <rbrown@suse.com> --- (In reply to Richard Brown from comment #25)

...

(In reply to Rafael Fernández López from comment #22)

...
I have the impression that `net.ipv6.conf.all.forwarding = 0` being set by `/etc/sysctl.d/70-yast.conf` has an impact here.

I did override this setting by creating a `/etc/sysctl.d/91-kubeadm.conf` file with contents:

``` net.ipv4.ip_forward = 1 net.ipv6.conf.all.forwarding = 1 ```

After rebooting the node, everything works fine. As Richard mentioned, removing `/etc/sysctl.d/70-yast.conf` altogether and rebooting also makes the trick.

This makes me think that the override in `/usr/lib/sysctl.d/90-kubeadm.conf` is not enough, it currently has:

``` # The file is provided as part of the kubernetes-kubeadm package net.ipv4.ip_forward = 1 ```

From what I see, it should include `net.ipv6.conf.all.forwarding = 1` as well. I cannot explain why this is happening in a better way right now though.

I tried this before making my post, and it didn't work for me..but I trust your observation also so I'm putting it in a patch for kubernetes1.18 and kubernetes1.17 and testing those packages :)

thanks!

Put the change in the package, and confirmed - it does not work to add `net.ipv6.conf.all.forwarding = 1` However, I can confirm, if I copy 90-kubeadm.conf to /etc/sysctl.d, then it works. This means something is incorrectly parsing/not parsing /usr/share/sysctl.d Now we just need to figure out what -- You are receiving this mail because: You are on the CC list for the bug.