On Tue, 24 Jun 2014 09:51:41 +0200 Lukas Ocilka <lukas.ocilka@suse.com> wrote:
Hi,
Today we have an outage of the Yast Jenkins node because of "No space left on device". This is a virtual machine maintained by a team member (including scripts around). It's not the first time this node has died for some reason.
https://ci.opensuse.org/computer/yast-jenkins/executors/0/causeOfDeath Thread has died java.io.IOException: No space left on device
As this is a core infrastructure of the Yast team and everyone depends on it (as we got used to it and OBS/IBS also expects SRs to be done this way), I would like to change the current rules this way:
1. This is a core infrastructure --> it should run on a Yast-team-owned server (we have a HW ready in server room)
Nice idea, but I think that this won't work, as external jenkins node need to be in separated location to allow access from external network. But we can discuss it with Daniel.
2. Any (core) infrastructure, including scripts, should be owned by Yast team
I agree.
3. Everything needs to be at GitHub following the same (but stricter) rules for merging changes (continuous deployment from GitHub?) For instance, I'd recommend to require two different LGTMs instead of one as it is now
Well, now it is on susestudio including script and excluding credentials as I don't want to have it public.
4. Everything needs to be well documented in a way that "everyone" in the team should be able to recover from such error or even start a new node if something goes really bad
My plan for internal node is to run two nodes, so if one goes wrong, then it should not be problem. For external it is a bit tricky. Currently access is granted for me, mvidner and Daniel and Bernhard.
5. Everyone in the team needs to have a root (e.g., via ssh keys) access but not to change anything they do not understand (documentation)
not a problem if I have ssh public keys of everyone.
6. We need monitoring, maybe internal IT guys can do that for us and they might have and access to that system for urgent cases as well
jenkins itself do monitoring, maybe just some reporting missing?
All this doesn't need to be changed "right now". One little step at the time. Then another.
agree.
Thanks in advance Lukas
Josef -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org