[opensuse-arm] openSUSE:Factory:ARM/armv6 outage, recovering
Hi, it seems recently qemu-linux-user 2.3.0 has been accepted to factory, which however immediately crashes on startup, so all the builds in the last two days (if I see correctly) have been hanging in an endless crash/rebuild loop. I've manually injected a qemu 2.2.0 build again and locked 2.3.0 out to get things moving. Greetings, Dirk -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Dirk Müller <dirk@dmllr.de> writes:
it seems recently qemu-linux-user 2.3.0 has been accepted to factory, which however immediately crashes on startup,
Why wasn't it tested? Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Am 05.05.2015 um 10:59 schrieb Andreas Schwab:
Dirk Müller <dirk@dmllr.de> writes:
it seems recently qemu-linux-user 2.3.0 has been accepted to factory, which however immediately crashes on startup,
Why wasn't it tested?
Good question. Why does this suddenly depend on Factory? I intentionally canceled two auto-submissions so that there were ~4 weeks of testing qemu in Virtualization. According to my information, openSUSE:Factory:ARM uses qemu-linux-user from Virtualization, which means that if there were a breakage we should've heard that much earlier. If that is silently changed behind my back, blame yourselves. Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton; HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi Andreas,
Why wasn't it tested? Good question. Why does this suddenly depend on Factory?
IIRC this was changed March 23rd based on your announcement that there are issues with 2.3.0-rc0.
openSUSE:Factory:ARM uses qemu-linux-user from Virtualization, which means that if there were a breakage we should've heard that much earlier. If that is silently changed behind my back, blame yourselves.
Its unfortunately not immediately noticeable that qemu-linux-user is broken, since packages never end up in the "failed" state (this was a thinko from our discussion yesterday), since the build service thinks its a bad worker and restarts the job on a different host, only to fail there the same way. It is only noticed then by an admin seeing that the cluster is flooded with jobs that never succeed and other things get behind. so I do recommend to do an automated testing outside the buildservice instead, which gives you notifications and enough time to fix so that the stuff that enters Factory is working. Greetings, Dirk -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Am 06.05.2015 um 12:22 schrieb Dirk Müller:
Hi Andreas,
Why wasn't it tested? Good question. Why does this suddenly depend on Factory?
IIRC this was changed March 23rd based on your announcement that there are issues with 2.3.0-rc0.
Then that was a clear miscommunication, as I immediately committed the fix to Virtualization once I got the patch. I only asked to check whether retriggering any failed builds fixes them. Regards, Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton; HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Dirk Müller wrote:
Why wasn't it tested? Good question. Why does this suddenly depend on Factory?
IIRC this was changed March 23rd based on your announcement that there are issues with 2.3.0-rc0.
openSUSE:Factory:ARM uses qemu-linux-user from Virtualization, which means that if there were a breakage we should've heard that much earlier. If that is silently changed behind my back, blame yourselves.
Its unfortunately not immediately noticeable that qemu-linux-user is broken, since packages never end up in the "failed" state (this was a thinko from our discussion yesterday), since the build service thinks its a bad worker and restarts the job on a different host, only to fail there the same way. It is only noticed then by an admin seeing that the cluster is flooded with jobs that never succeed and other things get behind.
so I do recommend to do an automated testing outside the buildservice instead, which gives you notifications and enough time to fix so that the stuff that enters Factory is working.
Would it be possible to cross compile a small arm binary on x86 and run it through qemu-linux-user? Something like $ zypper in gcc-whatever qemu-something $ gcc somecode.c $ qemu-something a.out; echo $? That should be easy to run in openQA... cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.de/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5; 90409 Nürnberg; Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On 06.05.15 14:30, Ludwig Nussel wrote:
Dirk Müller wrote:
Why wasn't it tested? Good question. Why does this suddenly depend on Factory?
IIRC this was changed March 23rd based on your announcement that there are issues with 2.3.0-rc0.
openSUSE:Factory:ARM uses qemu-linux-user from Virtualization, which means that if there were a breakage we should've heard that much earlier. If that is silently changed behind my back, blame yourselves.
Its unfortunately not immediately noticeable that qemu-linux-user is broken, since packages never end up in the "failed" state (this was a thinko from our discussion yesterday), since the build service thinks its a bad worker and restarts the job on a different host, only to fail there the same way. It is only noticed then by an admin seeing that the cluster is flooded with jobs that never succeed and other things get behind.
so I do recommend to do an automated testing outside the buildservice instead, which gives you notifications and enough time to fix so that the stuff that enters Factory is working.
Would it be possible to cross compile a small arm binary on x86 and run it through qemu-linux-user? Something like
$ zypper in gcc-whatever qemu-something $ gcc somecode.c $ qemu-something a.out; echo $?
That should be easy to run in openQA...
Something that simple could even be part of %check in the package ;). Unfortunately cross compilers aren't really in openSUSE yet IIUC. Alex -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Alexander Graf wrote:
so I do recommend to do an automated testing outside the buildservice instead, which gives you notifications and enough time to fix so that the stuff that enters Factory is working.
Would it be possible to cross compile a small arm binary on x86 and run it through qemu-linux-user? Something like
$ zypper in gcc-whatever qemu-something $ gcc somecode.c $ qemu-something a.out; echo $?
That should be easy to run in openQA...
Something that simple could even be part of %check in the package ;). Unfortunately cross compilers aren't really in openSUSE yet IIUC.
Looks like gcc5 brought them. cross-armv6hl-gcc5 at least sounds like it could do the job :-) cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.de/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5; 90409 Nürnberg; Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Ludwig Nussel <ludwig.nussel@suse.de> writes:
Would it be possible to cross compile a small arm binary on x86 and run it through qemu-linux-user?
All that would be needed is to enable Virtualization for openSUSE:Factory:ARM/armv6l. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Dirk Müller <dirk@dmllr.de> writes:
Its unfortunately not immediately noticeable that qemu-linux-user is broken, since packages never end up in the "failed" state (this was a thinko from our discussion yesterday), since the build service thinks its a bad worker and restarts the job on a different host, only to fail there the same way. It is only noticed then by an admin seeing
You don't need to be admin to notice the problem. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Am 06.05.2015 um 14:59 schrieb Andreas Schwab:
Dirk Müller <dirk@dmllr.de> writes:
Its unfortunately not immediately noticeable that qemu-linux-user is broken, since packages never end up in the "failed" state (this was a thinko from our discussion yesterday), since the build service thinks its a bad worker and restarts the job on a different host, only to fail there the same way. It is only noticed then by an admin seeing
You don't need to be admin to notice the problem.
I just verified, and on my local system using qemu-linux-user from Virtualization I am able to run qemu-x86_64 /bin/ls just fine. So it does not seem to be a general qemu-linux-user problem. Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton; HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Andreas Färber <afaerber@suse.de> writes:
Virtualization I am able to run qemu-x86_64 /bin/ls just fine. So it
This is about qemu-arm. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Am 06.05.2015 um 15:05 schrieb Andreas Schwab:
Andreas Färber <afaerber@suse.de> writes:
Virtualization I am able to run qemu-x86_64 /bin/ls just fine.
This is about qemu-arm.
You claimed yesterday that it has not been tested at all; I am thus telling you that it does pass a smoke test on my system. Dirk did not bother to CC me originally, and none of you have so far pointed to any particular breakage. According to my OBS mails, there were two gnuradio failures over the weekend plus two pihwm failures. None of them show qemu-arm misbehaving! One is about undefined references and the other is about some libtool version mismatch. So I haven't the foggiest why you are so upset! Last time a qemu-linux-user breakage came up, I asked for people to contribute tests to avoid things breaking silently in the future - no one bothered, so don't you dare cry about lack of testing now. I have tried adding a trivial test: %check %{qemu_arch}-linux-user/qemu-%{qemu_arch} %_bindir/ls > /dev/null but this always fails for all architectures and repos in OBS - sometimes with segfaults, sometimes without - whereas it succeeds with exit code 0 on my local 13.2 x86_64 system. So I am clueless what's going on here - some difference between OBS and my system apparently. https://build.opensuse.org/package/show/home:a_faerber:branches:Virtualizati... Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton; HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi,
Dirk did not bother to CC me originally, and none of you have so far pointed to any particular breakage.
There is no "original" mail other than this thread, really, so you're not left out. I didn't bother to debug it further since it was broken everywhere on every occassion including on /bin/false. Here's the backtrace: Program received signal SIGSEGV, Segmentation fault. thunk_convert (dst=dst@entry=0x7fffffffcbe0, src=0x4000a1f170, type_ptr=0x6051b20c <ioctl_entries+268>, type_ptr@entry=0x6051b204 <ioctl_entries+260>, to_host=to_host@entry=1) at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/thunk.c:273 273 (*se->convert[to_host])(dst, src); (gdb) bt #0 thunk_convert (dst=dst@entry=0x7fffffffcbe0, src=0x4000a1f170, type_ptr=0x6051b20c <ioctl_entries+268>, type_ptr@entry=0x6051b204 <ioctl_entries+260>, to_host=to_host@entry=1) at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/thunk.c:273 #1 0x0000000060038358 in do_ioctl (arg=274888520048, cmd=<optimized out>, fd=<optimized out>) at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/linux-user/syscall.c:3940 #2 do_syscall (cpu_env=cpu_env@entry=0x625a5bd0, num=16, arg1=<optimized out>, arg2=<optimized out>, arg3=274888520048, arg4=<optimized out>, arg5=274901073728, arg6=274888522607, arg7=0, arg8=0) at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/linux-user/syscall.c:6281 #3 0x00000000600298b6 in cpu_loop (env=env@entry=0x625a5bd0) at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/linux-user/main.c:305 #4 0x0000000060003676 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/linux-user/main.c:4419 (gdb) p se->convert[to_host] $1 = (void (*)(void *, const void *)) 0xbabababababababa which means the pointer has been free'ed already. Overall this points out that 0037-linux-user-Allocate-thunk-size-dyna.patch is the culprit.
were two gnuradio failures over the weekend plus two pihwm failures.
As I explained before, you don't get failure reports on qemu-linux-user failures (which is why I think you should use e.g. jenkins or the like for testing qemu).
but this always fails for all architectures and repos in OBS - sometimes with segfaults, sometimes without - whereas it succeeds with exit code 0 on my local 13.2 x86_64 system. So I am clueless what's going on here - some difference between OBS and my system apparently.
Its just a memory management issue, you can install aaa_base-malloccheck (which is always installed in build environments) or you can use more sophisticated tools like .e.g. valgrind. Greetings, Dirk -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On 06.05.15 23:02, Dirk Müller wrote:
Hi,
Dirk did not bother to CC me originally, and none of you have so far pointed to any particular breakage.
There is no "original" mail other than this thread, really, so you're not left out. I didn't bother to debug it further since it was broken everywhere on every occassion including on /bin/false. Here's the backtrace:
Program received signal SIGSEGV, Segmentation fault.
thunk_convert (dst=dst@entry=0x7fffffffcbe0, src=0x4000a1f170,
type_ptr=0x6051b20c <ioctl_entries+268>, type_ptr@entry=0x6051b204 <ioctl_entries+260>,
to_host=to_host@entry=1) at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/thunk.c:273
273 (*se->convert[to_host])(dst, src);
(gdb) bt
#0 thunk_convert (dst=dst@entry=0x7fffffffcbe0, src=0x4000a1f170,
type_ptr=0x6051b20c <ioctl_entries+268>, type_ptr@entry=0x6051b204 <ioctl_entries+260>,
to_host=to_host@entry=1) at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/thunk.c:273
#1 0x0000000060038358 in do_ioctl (arg=274888520048, cmd=<optimized out>, fd=<optimized out>)
at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/linux-user/syscall.c:3940
#2 do_syscall (cpu_env=cpu_env@entry=0x625a5bd0, num=16, arg1=<optimized out>,
arg2=<optimized out>, arg3=274888520048, arg4=<optimized out>, arg5=274901073728,
arg6=274888522607, arg7=0, arg8=0)
at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/linux-user/syscall.c:6281
#3 0x00000000600298b6 in cpu_loop (env=env@entry=0x625a5bd0)
at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/linux-user/main.c:305
#4 0x0000000060003676 in main (argc=<optimized out>, argv=<optimized out>,
envp=<optimized out>) at /home/abuild/rpmbuild/BUILD/qemu-2.3.0/linux-user/main.c:4419
(gdb) p se->convert[to_host]
$1 = (void (*)(void *, const void *)) 0xbabababababababa
which means the pointer has been free'ed already. Overall this points out that
0037-linux-user-Allocate-thunk-size-dyna.patch is the culprit.
Bleks. I'm still waiting for the day when I write a patch and it just works. The thunk framework implicitly assumed that the se->convert fields are initialized to 0. This did happen before my patch when the thunk cache resided in the bss section. Now we're allocating it dynamically and so it may get filled with garbage (correctly tested by the malloccheck thing). The easy fix is to s/g_new/g_new0/ to expose the same allocation semantics as before. I've changed the code accordingly and submitted a fixed qemu package to the Virtualization project. Alex -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi Alex,
The easy fix is to s/g_new/g_new0/ to expose the same allocation semantics as before. I've changed the code accordingly and submitted a fixed qemu package to the Virtualization project.
Ah great, thanks for the quick fix. I've switched openSUSE:Factory:ARM to that new version now! Greetings, Dirk -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
HI Andreas,
its a bad worker and restarts the job on a different host, only to fail there the same way. It is only noticed then by an admin seeing You don't need to be admin to notice the problem.
well, either that or actively waiting for a job to start, reloading the webui often enough so that you can catch the build log as it happens before it gets reverted.. I wouldn't say many people are able to do that :-) Greetings, Dirk -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Dirk Müller <dirk@dmllr.de> writes:
well, either that or actively waiting for a job to start, reloading the webui often enough so that you can catch the build log as it
Don't use the webui. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
participants (5)
-
Alexander Graf
-
Andreas Färber
-
Andreas Schwab
-
Dirk Müller
-
Ludwig Nussel