On Tue, Mar 26, Robert Munteanu wrote:
I have one recurring problem with the master becoming unresponsive after some time. Looking at the grafana charts I can see that there is steadily increasing Disk I/O for the master nodee. At about ~30 minutes after launching all 4 nodes I see the load average is 6 on the master node, with disk read I/O increasing steadily. The 5 minute average as collected by prometheus is at about 350 MB/s.
This pretty much sounds like etcd, which is continously writing to disk for master election. Normally the advice is, to use etcd only on a SSD, could be that in our case, the disk I/O is the problem. In my opinion, the etcd way to implement this algo is a mis-design, HA can do the same without this ... But I don't know enough about etcd if there is anything you could do. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & MicroOS SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany GF: Felix Imendoerffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-kubic+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kubic+owner@opensuse.org