Hi.
I've got weird issue with one of the OBS jobs which I don't know how to debug. From time to time the build job fails.
It fails with "Illegal instruction" error - the code uses tricky SSE optimizations which are heavily dependent on processor features.
The problem is that I do not see any pattern: - any of the ubuntu or debian builds might fail (several or single) - I'm unable to reproduce this locally - the build failure is gone next day without any code changes
Overall this seems like some sort of spooky race condition. If I "trigger rebuild" than the build is back to normal. I guess that it depends on what kind VM/CPU the build has been scheduled on. Is there a way to get additional info about the VM on which the build is running?
Example build log is attached but I'm unable to make sense out of it.
Hey,
On 20.07.2017 11:47, Max wrote:
It fails with "Illegal instruction" error - the code uses tricky SSE optimizations which are heavily dependent on processor features.
Find out which & create a build constraint? :-)
http://openbuildservice.org/help/manuals/obs-reference-guide/cha.obs.build_j...
Henne
Am Donnerstag, 20. Juli 2017, 12:18:44 CEST schrieb Henne Vogelsang:
Hey,
On 20.07.2017 11:47, Max wrote:
It fails with "Illegal instruction" error - the code uses tricky SSE optimizations which are heavily dependent on processor features.
Find out which & create a build constraint? :-)
http://openbuildservice.org/help/manuals/obs-reference-guide/cha.obs.build_j ob_constraints.html#idm140109221460960
Henne
An additional hint:
You can use "osc workerinfo" to find out a bit more about the worker where the job was running on:
## BUILD FAILED: # osc workerinfo x86_64:build36:1 |grep sse <flag>sse</flag> <flag>sse2</flag> <flag>sse4a</flag> <flag>misalignsse</flag>
## BUILD WORKED # osc workerinfo x86_64:cloud117:1 |grep sse <flag>sse</flag> <flag>sse2</flag> <flag>ssse3</flag> <flag>sse4_1</flag> <flag>sse4_2</flag>
Looks very useful, thank you. Where do I get this "build36" or "cloud117" from? also what does ":1" at the end refers to? Basically how do I gather parameters for "osc workerinfo" from the failed build log?
On 20.07.2017 12:24, Frank Schreiner wrote:
An additional hint: You can use "osc workerinfo" to find out a bit more about the worker where the job was running on:
## BUILD FAILED: # osc workerinfo x86_64:build36:1 |grep sse <flag>sse</flag> <flag>sse2</flag> <flag>sse4a</flag> <flag>misalignsse</flag>
## BUILD WORKED # osc workerinfo x86_64:cloud117:1 |grep sse <flag>sse</flag> <flag>sse2</flag> <flag>ssse3</flag> <flag>sse4_1</flag> <flag>sse4_2</flag>
Am Donnerstag, 20. Juli 2017, 12:31:32 CEST schrieb Max:
Looks very useful, thank you. Where do I get this "build36" or "cloud117" from? also what does ":1" at the end refers to?
It is:
<arch>:<host>:<vm_num>
In the log you sent, you can find a line like
[ 0s] build36 started "build libosmocore_0.9.6.20170719.dsc" at Wed Jul 19 19:49:50 UTC 2017.
build36 is the host. Here you can find the arch (maybe there is a better solution with osc - I don`t know)
https://build.opensuse.org/monitor
and vm_num=1 is always a good idea. Normally the other VM`s should be the same.
Basically how do I gather parameters for "osc workerinfo" from the failed build log? On 20.07.2017 12:24, Frank Schreiner wrote:
An additional hint: You can use "osc workerinfo" to find out a bit more about the worker where the job was running on:
## BUILD FAILED: # osc workerinfo x86_64:build36:1 |grep sse
<flag>sse</flag> <flag>sse2</flag> <flag>sse4a</flag> <flag>misalignsse</flag>
## BUILD WORKED # osc workerinfo x86_64:cloud117:1 |grep sse
<flag>sse</flag> <flag>sse2</flag> <flag>ssse3</flag> <flag>sse4_1</flag> <flag>sse4_2</flag>
Exactly what I was looking for, thank you.
The <arch> is the "hostarch=" parameter of the worker obtained from "osc api /worker/_status" or it's arch of the failed build?
For example I got build for i586 failed while x86_64 build is ok (were on different workers).
Getting data on the worker of failed i586 build: osc api /worker/_status|grep build34 <idle workerid="build34:1" hostarch="x86_64"/> <idle workerid="build34:2" hostarch="x86_64"/> <idle workerid="build34:3" hostarch="x86_64"/> <idle workerid="build34:5" hostarch="x86_64"/> <idle workerid="build34:6" hostarch="x86_64"/> <building workerid="build34:4" hostarch="x86_64" project="Kernel:linux-next" repository="standard" package="kernel-vanilla" arch="i586" starttime="1500631045"/>
Getting sse flags: osc workerinfo x86_64:build34:1 |grep sse <flag>sse</flag> <flag>sse2</flag> <flag>sse4a</flag> <flag>misalignsse</flag>
but osc workerinfo x586:build34:1 |grep sse Server returned an error: HTTP Error 404: remote error unknown worker remote error: unknown worker
On 20.07.2017 12:43, Frank Schreiner wrote:
It is:
<arch>:<host>:<vm_num>
In the log you sent, you can find a line like
[ 0s] build36 started "build libosmocore_0.9.6.20170719.dsc" at Wed Jul 19 19:49:50 UTC 2017.
build36 is the host. Here you can find the arch (maybe there is a better solution with osc - I don`t know)
https://build.opensuse.org/monitor
and vm_num=1 is always a good idea. Normally the other VM`s should be the same.
On Freitag, 21. Juli 2017, 13:41:55 CEST wrote Max:
The <arch> is the "hostarch=" parameter of the worker obtained from "osc api /worker/_status" or it's arch of the failed build?
The hostarch is the architecture from the host ;)
It is independend of the build target.
i586, but also emulated architectures (eg armv6) can be build on a x86_64 host.
For example I got build for i586 failed while x86_64 build is ok (were on different workers).
Getting data on the worker of failed i586 build: osc api /worker/_status|grep build34
<idle workerid="build34:1" hostarch="x86_64"/> <idle workerid="build34:2" hostarch="x86_64"/> <idle workerid="build34:3" hostarch="x86_64"/> <idle workerid="build34:5" hostarch="x86_64"/> <idle workerid="build34:6" hostarch="x86_64"/> <building workerid="build34:4" hostarch="x86_64" project="Kernel:linux-next" repository="standard" package="kernel-vanilla" arch="i586" starttime="1500631045"/>
Getting sse flags: osc workerinfo x86_64:build34:1 |grep sse <flag>sse</flag> <flag>sse2</flag> <flag>sse4a</flag> <flag>misalignsse</flag>
but osc workerinfo x586:build34:1 |grep sse
Server returned an error: HTTP Error 404: remote error unknown worker remote error: unknown worker
Right, there is no i586 worker host, only a x86_64 which can also run i586 via either booting 32-bit kernel or via personality switch.
However, it is up to the kernel and cpu then to offer the optimizations in 32bit legacy mode. I suppose that at least sse4a won't be available there.
But this is all content from POV of OBS, we only offer the VM here.
On 20.07.2017 12:43, Frank Schreiner wrote:
It is:
<arch>:<host>:<vm_num>
In the log you sent, you can find a line like
[ 0s] build36 started "build libosmocore_0.9.6.20170719.dsc" at Wed Jul 19 19:49:50 UTC 2017.
build36 is the host. Here you can find the arch (maybe there is a better solution with osc - I don`t know)
https://build.opensuse.org/monitor
and vm_num=1 is always a good idea. Normally the other VM`s should be the same.
Excellent advice, thank you.
Is there some sort of negation operator for those constraints? I mean setting bunch of flags would most likely fix our build, but it would be nice to fix our code instead - to find particular combination of present/absent flags causes build failure. Is there a way to constrain build job to workers _without_ cpu flag sse2 for example?
On 20.07.2017 12:18, Henne Vogelsang wrote:
Find out which & create a build constraint? :-)
http://openbuildservice.org/help/manuals/obs-reference-guide/cha.obs.build_j...
buildservice@lists.opensuse.org