http://bugzilla.suse.com/show_bug.cgi?id=1010441 Bug ID: 1010441 Summary: Kubernetes - kubelet service down on minion nodes Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.1 Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: KDE Applications Assignee: opensuse-kde-bugs@opensuse.org Reporter: rgherlea@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Periodically the kubelet services are randomly failing on the minion nodes and have to be restarted manually I am running loctus for load testing and over the weekend 2 out of 3 minions have stopped responding, having the kubelet services down: host-44-11-1-23:~ # systemctl status kube-proxy.service kube-proxy.service - Kubernetes Kube-Proxy Server Loaded: loaded (/usr/lib/systemd/system/kube-proxy.service; enabled) Active: inactive (dead) since Fri 2016-11-11 23:57:17 UTC; 2 days ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Process: 1065 ExecStart=/usr/bin/kube-proxy $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBE_MASTER $KUBE_PROXY_ARGS (code=killed, signal=PIPE) Main PID: 1065 (code=killed, signal=PIPE) kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE locust-1865418122-0ns56 0/1 Pending 0 2d <none> locust-1865418122-0nvq9 1/1 Running 0 2d 172.20.67.74 razvan-kube-minion0.openstack.local locust-1865418122-11r7c 0/1 Pending 0 2d <none> locust-1865418122-19cnn 1/1 Running 0 2d 172.20.67.102 razvan-kube-minion0.openstack.local locust-1865418122-1kwpv 0/1 Pending 0 2d <none> locust-1865418122-1pbon 0/1 Pending 0 2d <none> locust-1865418122-1zosp 0/1 Pending 0 2d <none> locust-1865418122-27oa7 1/1 Running 0 2d 172.20.67.7 razvan-kube-minion0.openstack.local locust-1865418122-2aee7 1/1 Running 0 2d 172.20.67.24 razvan-kube-minion0.openstack.local locust-1865418122-2ct4b 1/1 Running 0 2d 172.20.67.106 razvan-kube-minion0.openstack.local locust-1865418122-2nl93 1/1 Running 0 2d 172.20.67.65 razvan-kube-minion0.openstack.local locust-1865418122-2xxbb 1/1 Running 0 2d 172.20.67.72 razvan-kube-minion0.openstack.local locust-1865418122-32uvh 1/1 Running 0 2d 172.20.67.20 razvan-kube-minion0.openstack.local locust-1865418122-3gv2w 1/1 Running 0 2d 172.20.67.56 razvan-kube-minion0.openstack.local locust-1865418122-3vz39 0/1 Pending 0 2d <none> locust-1865418122-41016 0/1 Pending 0 2d <none> locust-1865418122-49wkx 1/1 Running 0 2d 172.20.67.100 razvan-kube-minion0.openstack.local locust-1865418122-4evoh 1/1 Running 0 2d 172.20.67.46 razvan-kube-minion0.openstack.local locust-1865418122-4iu9k 1/1 Running 0 2d 172.20.67.68 razvan-kube-minion0.openstack.local locust-1865418122-4lqnt 0/1 Pending 0 2d <none> locust-1865418122-4sitz 0/1 Pending 0 2d <none> locust-1865418122-4vbw4 0/1 Pending 0 2d <none> locust-1865418122-53oas 0/1 Pending 0 2d <none> locust-1865418122-547uw 1/1 Running 0 2d 172.20.67.99 razvan-kube-minion0.openstack.local The same situation happens on the kube-master level, services are down: host-44-11-1-22:~ # systemctl status kube-apiserver.service kube-apiserver.service - Kubernetes API Server Loaded: loaded (/usr/lib/systemd/system/kube-apiserver.service; enabled) Active: inactive (dead) since Tue 2016-11-15 14:54:11 UTC; 18h ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Process: 1691 ExecStart=/usr/bin/kube-apiserver $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBE_ETCD_SERVERS $KUBE_API_ADDRESS $KUBE_API_PORT $KUBELET_PORT $KUBE_ALLOW_PRIV $KUBE_SERVICE_ADDRESSES $KUBE_ADMISSION_CONTROL $KUBE_API_ARGS (code=killed, signal=PIPE) Main PID: 1691 (code=killed, signal=PIPE) host-44-11-1-22:~ # systemctl status kube-scheduler.service kube-scheduler.service - Kubernetes Scheduler Plugin Loaded: loaded (/usr/lib/systemd/system/kube-scheduler.service; enabled) Active: inactive (dead) since Tue 2016-11-15 14:54:11 UTC; 18h ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Process: 651 ExecStart=/usr/bin/kube-scheduler $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBE_MASTER $KUBE_SCHEDULER_ARGS (code=killed, signal=PIPE) Main PID: 651 (code=killed, signal=PIPE) Nov 14 12:27:24 host-44-11-1-22 kube-scheduler[651]: I1114 12:27:24.824651 651 priorities.go:39] Combined re...1-23 Nov 14 12:27:24 host-44-11-1-22 kube-scheduler[651]: I1114 12:27:24.824667 651 priorities.go:39] Combined re...1-25 Nov 14 12:27:24 host-44-11-1-22 kube-scheduler[651]: I1114 12:27:24.824677 651 priorities.go:39] Combined re...1-25 Nov 14 12:27:24 host-44-11-1-22 kube-scheduler[651]: I1114 12:27:24.832254 651 priorities.go:39] Combined re...1-23 Nov 14 12:27:24 host-44-11-1-22 kube-scheduler[651]: I1114 12:27:24.832288 651 priorities.go:39] Combined re...1-23 Nov 14 12:27:24 host-44-11-1-22 kube-scheduler[651]: I1114 12:27:24.832302 651 priorities.go:39] Combined re...1-25 Nov 14 12:27:24 host-44-11-1-22 kube-scheduler[651]: I1114 12:27:24.832311 651 priorities.go:39] Combined re...1-25 Nov 14 12:27:24 host-44-11-1-22 kube-scheduler[651]: I1114 12:27:24.976279 651 event.go:216] Event(api.ObjectRef... Nov 14 12:27:25 host-44-11-1-22 kube-scheduler[651]: I1114 12:27:25.103421 651 event.go:216] Event(api.ObjectRef... Nov 15 14:47:25 host-44-11-1-22 kube-scheduler[651]: W1115 14:47:25.332630 651 reflector.go:334] k8s.io/kube...072) Hint: Some lines were ellipsized, use -l to show in full. host-44-11-1-22:~ # It looks like when the kubernetes services on the master node are down, it affects the minions as well. I had to start the services on the master and minions in order to have the cluster up and running. If the kube-master remains up, manually starting the service on the minion level everything is working as expected. I opened an issue on gitlab for this https://gitlab.suse.de/docker/k8s-terraform/issues/14 The supportconfig files, from the kube-admin and the minion nodes, are attached into the bug report. Please let me know if you need more information from my side. Thanks, Razvan -- You are receiving this mail because: You are on the CC list for the bug.