http://bugzilla.novell.com/show_bug.cgi?id=507551 Summary: dead builder recovery Classification: openSUSE Product: openSUSE.org Version: unspecified Platform: Other OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: BuildService AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: michael_e_brown@dell.com QAContact: adrian@novell.com Found By: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.10) Gecko/2009042708 Fedora/3.0.10-1.fc10 Firefox/3.0.10 If a machine running a builder dies while the builder is running a job, the job is 'stuck' and cannot be cleared using the web client. I've had to go and delete the job and builder files from /srv/obs/workers/ and /srv/obs/jobs/ manually to clear the 'building' state. Builders should send keepalive packets during a build, and frontend should detect when a builder drops off and re-assign its job to somebody who is still alive. I have not had a chance to test to see if graceful worker shutdown (eg. rcobsworker stop) does the same thing, but I suspect it may. Reproducible: Always Steps to Reproduce: 1. Bring up new worker 2. wait until worker starts a job 3. unplug machine Actual Results: frontend thinks worker is still working and that package gets stuck Expected Results: frontend detects that worker died and reassigns job. I would like to create a pool of "temporary" worker machines, eg. developer machines that may otherwise be idle, but because jobs get stuck if a developer reboots their box, this makes this mode of operation unreliable. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.