[Bug 507551] New: dead builder recovery
http://bugzilla.novell.com/show_bug.cgi?id=507551 Summary: dead builder recovery Classification: openSUSE Product: openSUSE.org Version: unspecified Platform: Other OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: BuildService AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: michael_e_brown@dell.com QAContact: adrian@novell.com Found By: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.10) Gecko/2009042708 Fedora/3.0.10-1.fc10 Firefox/3.0.10 If a machine running a builder dies while the builder is running a job, the job is 'stuck' and cannot be cleared using the web client. I've had to go and delete the job and builder files from /srv/obs/workers/ and /srv/obs/jobs/ manually to clear the 'building' state. Builders should send keepalive packets during a build, and frontend should detect when a builder drops off and re-assign its job to somebody who is still alive. I have not had a chance to test to see if graceful worker shutdown (eg. rcobsworker stop) does the same thing, but I suspect it may. Reproducible: Always Steps to Reproduce: 1. Bring up new worker 2. wait until worker starts a job 3. unplug machine Actual Results: frontend thinks worker is still working and that package gets stuck Expected Results: frontend detects that worker died and reassigns job. I would like to create a pool of "temporary" worker machines, eg. developer machines that may otherwise be idle, but because jobs get stuck if a developer reboots their box, this makes this mode of operation unreliable. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=507551 Andreas Jaeger <aj@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.pr |adrian@novell.com |ovo.novell.com | -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=507551 User choeger@open-xchange.com added comment http://bugzilla.novell.com/show_bug.cgi?id=507551#c1 Carsten Hoeger <choeger@open-xchange.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |choeger@open-xchange.com --- Comment #1 from Carsten Hoeger <choeger@open-xchange.com> 2009-07-16 02:20:52 MDT --- *** Bug 490604 has been marked as a duplicate of this bug. *** http://bugzilla.novell.com/show_bug.cgi?id=490604 -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=507551 https://bugzilla.novell.com/show_bug.cgi?id=507551#c2 Adrian Schröter <adrian@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #2 from Adrian Schröter <adrian@suse.com> 2012-03-29 12:16:38 UTC --- The bs_warden process is monitoring workers meanwhile and restarts builds. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com