[yast-devel] RFC: Yast Team Infrastructure: Yast Jenkins Node

24 Jun 2014

      Hi,

Today we have an outage of the Yast Jenkins node because of "No space
left on device". This is a virtual machine maintained by a team member
(including scripts around). It's not the first time this node has died
for some reason.

https://ci.opensuse.org/computer/yast-jenkins/executors/0/causeOfDeath
Thread has died
java.io.IOException: No space left on device

As this is a core infrastructure of the Yast team and everyone depends
on it (as we got used to it and OBS/IBS also expects SRs to be done
this way), I would like to change the current rules this way:

1. This is a core infrastructure --> it should run on a Yast-team-owned
    server (we have a HW ready in server room)
2. Any (core) infrastructure, including scripts, should be owned by
    Yast team
3. Everything needs to be at GitHub following the same (but stricter)
    rules for merging changes (continuous deployment from GitHub?)
    For instance, I'd recommend to require two different LGTMs instead
    of one as it is now
4. Everything needs to be well documented in a way that "everyone" in
    the team should be able to recover from such error or even start a
    new node if something goes really bad
5. Everyone in the team needs to have a root (e.g., via ssh keys)
    access but not to change anything they do not understand
    (documentation)
6. We need monitoring, maybe internal IT guys can do that for us and
    they might have and access to that system for urgent cases as well

All this doesn't need to be changed "right now". One little step at the
time. Then another.

Thanks in advance
Lukas

-- 

Lukas Ocilka, Systems Management (Yast) Team Leader
Cloud & Systems Management Department, SUSE Linux
-- 
To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org
To contact the owner, e-mail: yast-devel+owner@opensuse.org