[Bug 1178453] New: [Build 20201103] gnuhealth_client crashes on start
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 Bug ID: 1178453 Summary: [Build 20201103] gnuhealth_client crashes on start Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other URL: https://openqa.opensuse.org/tests/1460183/modules/gnuh ealth_client_first_time/steps/2 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Other Assignee: screening-team-bugs@suse.de Reporter: dimstar@opensuse.org QA Contact: qa-bugs@suse.de Found By: openQA Blocker: Yes ## Observation openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-gnuhealth@64bit fails in [gnuhealth_client_first_time](https://openqa.opensuse.org/tests/1460183/modules/gnuhealth_client_first_tim...) ## Test suite description Maintainer: okurz@suse.de Test scenario for gnuhealth software stack ## Reproducible Fails since (at least) Build [20201103](https://openqa.opensuse.org/tests/1458960) ## Expected result Last good: [20201030](https://openqa.opensuse.org/tests/1456811) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=gnuhealth&version=Tumbleweed) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c4 Axel Braun <axel.braun@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS --- Comment #4 from Axel Braun <axel.braun@gmx.de> --- (In reply to Stefan Brüns from comment #3)
The gnuhealth-client is non-distributable, as it contains a binary blob, the camera plugin.
it comes with a __pycache__ which obviously caused the problem. I have removed it, rebuild and the client comes up normally. Thanks for the hint! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c6 --- Comment #6 from Axel Braun <axel.braun@gmx.de> --- Strange situation: The pycache is removed (and in between removed from upstream as well): [ 26s] /home/abuild/rpmbuild/BUILD/gnuhealth-client-3.6.9 [ 26s] + cd tryton/plugins [ 26s] + tar -xzvf /home/abuild/rpmbuild/SOURCES/gnuhealth_plugin_camera-latest.tar.gz [ 26s] gnuhealth_plugin_camera-3.6.0/ [ 26s] gnuhealth_plugin_camera-3.6.0/__pycache__/ [ 26s] gnuhealth_plugin_camera-3.6.0/__pycache__/__init__.cpython-37.pyc [ 26s] gnuhealth_plugin_camera-3.6.0/README [ 26s] gnuhealth_plugin_camera-3.6.0/version [ 26s] gnuhealth_plugin_camera-3.6.0/__init__.py [ 26s] gnuhealth_plugin_camera-3.6.0/COPYING [ 26s] + tar -xzvf /home/abuild/rpmbuild/SOURCES/gnuhealth_plugin_crypto-latest.tar.gz [ 26s] gnuhealth_plugin_crypto-3.6.0/ [ 26s] gnuhealth_plugin_crypto-3.6.0/__pycache__/ [ 26s] gnuhealth_plugin_crypto-3.6.0/__pycache__/__init__.cpython-37.pyc [ 26s] gnuhealth_plugin_crypto-3.6.0/version [ 26s] gnuhealth_plugin_crypto-3.6.0/__init__.py [ 26s] gnuhealth_plugin_crypto-3.6.0/doc/ [ 26s] gnuhealth_plugin_crypto-3.6.0/doc/index.rst [ 26s] + tar -xzvf /home/abuild/rpmbuild/SOURCES/gnuhealth_plugin_frl-latest.tar.gz [ 26s] gnuhealth_plugin_frl-3.6.1/ [ 26s] gnuhealth_plugin_frl-3.6.1/icons/ [ 26s] gnuhealth_plugin_frl-3.6.1/icons/gnuhealth_icon.svg [ 26s] gnuhealth_plugin_frl-3.6.1/icons/federation.svg [ 26s] gnuhealth_plugin_frl-3.6.1/version [ 26s] gnuhealth_plugin_frl-3.6.1/__init__.py [ 26s] gnuhealth_plugin_frl-3.6.1/doc/ [ 26s] gnuhealth_plugin_frl-3.6.1/doc/index.rst [ 26s] + mv gnuhealth_plugin_camera-3.6.0 camera [ 26s] + mv gnuhealth_plugin_crypto-3.6.0 crypto [ 26s] + mv gnuhealth_plugin_frl-3.6.1 frl [ 26s] + rm -rf camera/__pycache__ crypto/__pycache__ [ 26s] + RPM_EC=0 Nevertheless, the installation (and start) works on local builds and on my local TW-machine, but fails in openQA with the same error. gapi test in opencv are disabled - is this maybe related? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c9 --- Comment #9 from Axel Braun <axel.braun@gmx.de> --- (In reply to Stefan Br�ns from comment #8)
This is more likely an issue of the openQA worker, openCV is doing runtime dispatching for CPU feature dependent code.
can you explain this in a way a 10yr old blonde understands it? ;-) Who needs to look into this? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c11 --- Comment #11 from Oliver Kurz <okurz@suse.com> --- https://openqa.opensuse.org/tests/1476138#step/gnuhealth_client_first_time/1 is strange though. It was another retriggered test which ran with `QEMUCPU=qemu64` and it showed the client just fine *confused* -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c12 Axel Braun <axel.braun@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |axel.braun@gmx.de --- Comment #12 from Axel Braun <axel.braun@gmx.de> --- (In reply to Oliver Kurz from comment #11)
https://openqa.opensuse.org/tests/1476138#step/gnuhealth_client_first_time/1 is strange though. It was another retriggered test which ran with `QEMUCPU=qemu64` and it showed the client just fine *confused*
yes, and in current build 20201115 it runs fine as well. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c13 Oliver Kurz <okurz@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|okurz@suse.com |axel.braun@gmx.de --- Comment #13 from Oliver Kurz <okurz@suse.com> --- but that is with the workaround applied. I removed that again as Stephan Kulow suggested in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/11414#issuec... to keep it failing to not hide the product issue that we can not easily work around. axel.braun@gmx.de , assigning back to you as https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/11414#issuec... convinced me: If it doesn't run on certain cpus, it should be recompiled to do so. So with the help of openQA tests would could verify that the software crashes on certain CPUs but I hope maybe with the help of stefan.bruens@rwth-aachen.de you will be able to make it work so that it works on all CPU variants. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c14 Axel Braun <axel.braun@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(stefan.bruens@rwt | |h-aachen.de) --- Comment #14 from Axel Braun <axel.braun@gmx.de> --- (In reply to Oliver Kurz from comment #13)
but that is with the workaround applied. I removed that again as Stephan Kulow suggested in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/ 11414#issuecomment-728756213 to keep it failing to not hide the product issue that we can not easily work around.
axel.braun@gmx.de , assigning back to you as https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/ 11414#issuecomment-728717005 convinced me: If it doesn't run on certain cpus, it should be recompiled to do so.
Hm, there is basically nothing I can do from gnuhealth-side. And I cant judge if the openQA worker is the problem (maybe submitting invalid data?) or if opencv processes these data falsely. Or even a different problem. As we will probably release a new gnuhealth version soon (planned is this month) I would really like to get this fixed, or error removed by setting the host-CPU as suggested originally. What can we do? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c15 Axel Braun <axel.braun@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|[Build 20201103] |[Build 20201103] opencv |gnuhealth_client crashes on |crashes for certain CPU |start |type (openQA-worker) --- Comment #15 from Axel Braun <axel.braun@gmx.de> --- I have reported the issue upstream: https://github.com/opencv/opencv/issues/19020 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 Axel Braun <axel.braun@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|axel.braun@gmx.de |stefan.bruens@rwth-aachen.d | |e -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c17 Stefan Br�ns <stefan.bruens@rwth-aachen.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(okurz@suse.com) --- Comment #17 from Stefan Br�ns <stefan.bruens@rwth-aachen.de> --- Sorry I haven't followed up on this. I tried enabling the test suite in openCV, and the gapi test is also segfaulting on OBS. Unfortunately the test suite has to many external dependencies to be useful without much more work, so I have kept it disabled for now. The gapi code seems to trigger some error probably related to qemu/kvm in its dispatch code (so either CPU dispatching chooses the wrong path, or the dispatched code is failing). Will try to create trivial reproducer the coming days. @dimstar, @fokurz - for runtime dispatched code it would be useful to have different (emulated) hardware, both on OBS (internal testsuite) and on openQA. Do we have something in place for this? I.e. request a host which has only SSE2. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c19 Oliver Kurz <okurz@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(okurz@suse.com) | --- Comment #19 from Oliver Kurz <okurz@suse.com> --- (In reply to Stefan Br��ns from comment #17)
Sorry I haven't followed up on this.
I tried enabling the test suite in openCV, and the gapi test is also segfaulting on OBS. Unfortunately the test suite has to many external dependencies to be useful without much more work, so I have kept it disabled for now.
The gapi code seems to trigger some error probably related to qemu/kvm in its dispatch code (so either CPU dispatching chooses the wrong path, or the dispatched code is failing).
Will try to create trivial reproducer the coming days.
@dimstar, @fokurz - for runtime dispatched code it would be useful to have different (emulated) hardware, both on OBS (internal testsuite) and on openQA. Do we have something in place for this? I.e. request a host which has only SSE2.
Hm, I think what's possible is to run `osc build` locally or within respectively configured VMs. For openQA we can configure the VMs that are used for tests on the fly. E.g. with ``` openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 1583395 TEST=gnuhealth_only_sse2 QEMUCPU=base,sse2 _GROUP=0 BUILD=debug_bsc1178453 ``` I could create https://openqa.opensuse.org/t1583917 which (hopefully) will spawn a test job on a machine that has only sse2 enabled. We can tweak that with the "QEMUCPU" parameter which is directly passed to `qemu -cpu ���` Everyone with operator permissions on an openQA instance can run the same. You already have according permissions. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c22 --- Comment #22 from Axel Braun <axel.braun@gmx.de> --- Hello Oliver, (In reply to Oliver Kurz from comment #19)
(In reply to Stefan Br��ns from comment #17)
....
I could create https://openqa.opensuse.org/t1583917 which (hopefully) will spawn a test job on a machine that has only sse2 enabled. We can tweak that with the "QEMUCPU" parameter which is directly passed to `qemu -cpu ���`
The above test fails as well.. From the other two options mentioned in https://github.com/opencv/opencv/issues/19020#issuecomment-758142627: - disable LTO - disable dispatching in OpenCV (cmake ... -DCPU_DISPATCH=), but with SSE2 baseline performance results are not really good (third option - code change - is not something we can consider here I assume) we have only 'disable LTO' left as option? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c23 --- Comment #23 from Stefan Br�ns <stefan.bruens@rwth-aachen.de> --- Even without LTO the code is malformed. Consider a C++ method which is marked as inline. Contrary to common belief this does not force the compiler to actually inline the code (although often it is true), but allows to have multiple definitions, all but one being discarded during link time. The C++ standard requires all these definitions to be identical sequences of tokens (which is trivially true for e.g. headers included in different source files). Given identical compiler options identical token sequences will result in identical machine code, so which of this definitions is chosen does not matter. Now with different architecture flags you will end up with some definitions which are heavily architecture dependent. The linker may or may not chose the one definition which crashes with an AVX-incapable machine. C++ inline methods are quite common, as each method defined in-class, and each template method are implicitly inline. There are two mechanisms which deals with this properly, either function-multiversioning (FMV), and the HWCAPS approach used by the very latest glibc. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 Guillaume GARDET <guillaume.gardet@arm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |guillaume.gardet@arm.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178453 http://bugzilla.opensuse.org/show_bug.cgi?id=1178453#c29 --- Comment #29 from OBSbugzilla Bot <bwiedemann+obsbugzillabot@suse.com> --- This is an autogenerated message for OBS integration: This bug (1178453) was mentioned in https://build.opensuse.org/request/show/1064690 Factory / gnuhealth-client -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com