[opensuse-buildservice] random scheduler crashes on OBS 2.1.6

Hi all, I am about to debug a random scheduler crash with OBS 2.1.6 and wanted to present my findings so far in this audience in case someone has a good idea what might cause that and whether there might already be a fix available. What I see at random points in time is something like that: Use of uninitialized value $pkgid in concatenation (.) or string at ./bs_sched line 4664. Use of uninitialized value $pkgid in concatenation (.) or string at ./bs_sched line 4664. gen_meta: bad line gcc46-c++ This happens shortly after gcc46-c++ had been rebuilt. The only workaround I found so far is to remove the offending binary. gcc46-c++ is just an example. It could happen with arbitrary packages. What I found so far is that this happens due to the fact that solvable_lookup_checksum in perl-BSSolv-0.16.4 seems to return an empty string for the packages in question. Any idea what might be causing that or at least a pointer to what I should check? Otherwise I will just dig deeper into the case but a pointer which direction to dig to might be useful. Greetings, Robert -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org

On Tue, Nov 08, 2011 at 02:32:41PM +0100, Robert Schiele wrote:
I am about to debug a random scheduler crash with OBS 2.1.6 and wanted to present my findings so far in this audience in case someone has a good idea what might cause that and whether there might already be a fix available.
What I see at random points in time is something like that:
Use of uninitialized value $pkgid in concatenation (.) or string at ./bs_sched line 4664. Use of uninitialized value $pkgid in concatenation (.) or string at ./bs_sched line 4664. gen_meta: bad line gcc46-c++
This happens shortly after gcc46-c++ had been rebuilt. The only workaround I found so far is to remove the offending binary. gcc46-c++ is just an example. It could happen with arbitrary packages.
What I found so far is that this happens due to the fact that solvable_lookup_checksum in perl-BSSolv-0.16.4 seems to return an empty string for the packages in question.
Any idea what might be causing that or at least a pointer to what I should check? Otherwise I will just dig deeper into the case but a pointer which direction to dig to might be useful.
Is that on arm with qemu? mmap() doesn't seem to work, which leads to rpm not creating any signatures. Newer build service versions will not even accept those rpms, thus you don't get that errors (but the build will fail). Adrian knows about a workaround/fix for the qemu issue. Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX Products GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org

Am Dienstag, 8. November 2011, 15:46:30 schrieb Michael Schroeder:
On Tue, Nov 08, 2011 at 02:32:41PM +0100, Robert Schiele wrote:
I am about to debug a random scheduler crash with OBS 2.1.6 and wanted to present my findings so far in this audience in case someone has a good idea what might cause that and whether there might already be a fix available.
What I see at random points in time is something like that:
Use of uninitialized value $pkgid in concatenation (.) or string at ./bs_sched line 4664. Use of uninitialized value $pkgid in concatenation (.) or string at ./bs_sched line 4664. gen_meta: bad line gcc46-c++
This happens shortly after gcc46-c++ had been rebuilt. The only workaround I found so far is to remove the offending binary. gcc46-c++ is just an example. It could happen with arbitrary packages.
What I found so far is that this happens due to the fact that solvable_lookup_checksum in perl-BSSolv-0.16.4 seems to return an empty string for the packages in question.
Any idea what might be causing that or at least a pointer to what I should check? Otherwise I will just dig deeper into the case but a pointer which direction to dig to might be useful.
Is that on arm with qemu? mmap() doesn't seem to work, which leads to rpm not creating any signatures. Newer build service versions will not even accept those rpms, thus you don't get that errors (but the build will fail).
Adrian knows about a workaround/fix for the qemu issue.
It is the mmap fix in the qemu package you can find in openSUSE:12.1. In general you may want to run the same setup as we do in openSUSE:Factory:ARM We don't use a random qemu installed on the workers anymore but aggregate from a defined project (openSUSE:Tools in our case). In this way we can host the qemu package as part of our project and we can store it even after we released this project. You could even aggregate from our instace if you like and we all could work on the same qemu package to get it fixed :) We will also run a parallel build with a qemu 1.x later on to test it. bye adrian
Cheers, Michael. -- Adrian Schroeter SUSE Linux Products GmbH email: adrian@suse.de
-- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
participants (3)
-
Adrian Schröter
-
Michael Schroeder
-
Robert Schiele