[Bug 1185952] New: [Build 20210510] PostgreSQL is not startable on s390x
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 Bug ID: 1185952 Summary: [Build 20210510] PostgreSQL is not startable on s390x Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: S/390-64 URL: https://openqa.opensuse.org/tests/1734657/modules/post gresql_server/steps/6 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Other Assignee: screening-team-bugs@suse.de Reporter: ada.lovelace@gmx.de QA Contact: qa-bugs@suse.de Found By: openQA Blocker: Yes ## Observation openQA test in scenario opensuse-Tumbleweed-DVD-s390x-textmode@s390x-zVM-vswitch-l2 fails in [postgresql_server](https://openqa.opensuse.org/tests/1734657/modules/postgresql_server/steps/6) All packages can be installed. The problem is that the PostgreSQL server can not be started. The journal log is saying: -- Logs begin at Mon 2021-05-10 22:41:06 EDT, end at Mon 2021-05-10 23:23:02 EDT. -- May 10 23:15:15 susetest systemd[1]: Starting PostgreSQL database server... May 10 23:15:15 susetest systemd[1]: postgresql.service: Control process exited, code=exited, status=1/FAILURE May 10 23:15:15 susetest systemd[1]: postgresql.service: Failed with result 'exit-code'. May 10 23:15:15 susetest systemd[1]: Failed to start PostgreSQL database server. May 10 23:15:15 susetest postgresql-script[12832]: Cannot find an active PostgreSQL server binary. Please install one of the PostgreSQL May 10 23:15:15 susetest postgresql-script[12832]: server packages or activate an already installed version using update-alternatives. ## Test suite description Maintainer: QE Yast Installation in textmode and selecting the textmode "desktop" during installation. ## Reproducible Fails since (at least) Build [20210510](https://openqa.opensuse.org/tests/1734657) (current job) ## Expected result Last good: [20210507](https://openqa.opensuse.org/tests/1731697) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=s390x&distri=opensuse&flavor=DVD&machine=s390x-zVM-vswitch-l2&test=textmode&version=Tumbleweed) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 Sarah Kriesch <ada.lovelace@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|screening-team-bugs@suse.de |max@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c1 --- Comment #1 from Reinhard Max <max@suse.com> --- This is because a mix of postgresql11 and postgresql13 packages is installed. postgresql13 is active in update-alternatives, but postgresql13-server is not installed, so the service file cannot start the server of the active version. The server binary from postgresql11 would only be started if postgresql11 was set as the active alternative or if the data directory already contained a set of data files that was initialized with postgresql11. BTW, it doesn't seem to make sense to me that the test requests the installation of postgresql11-llvmjit together with postgresql13. Is there a reason for that? For the test to work, depending on the intention one of the following conditions would have to be met: - Call update-alternatives after package installation to make postgresql11 the default. - Don't install any postgresql13 package, so that postgresql11 becomes the default in update-alternatives. But notice that postgresql13-* (containing the binaries) is not to be confused with version 13 of postgresql-* which are noarch packages that contain only infrastructure and dependencies. - Install postgresql13-server, but then the tests will be run on that version. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c2 --- Comment #2 from Sarah Kriesch <ada.lovelace@gmx.de> --- It seems, that the tests have not been changed for the postgresql-server in the last days: https://openqa.opensuse.org/tests/1734657#investigation These packages are installed on Tumbleweed with: sudo zypper install postgresql-server In our openQA tests: zypper_call "in postgresql-server sudo"; -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c3 --- Comment #3 from Reinhard Max <max@suse.com> --- Ah, sorry, I misread the screenshots. I did recently work on the logic that decides whether to build or not build the llvmjit subpackage, but the SR for that was accepted 20 days ago, so I would have expected problems to surface earlier than this has. Can I see the package list of the medium somewhere that was used for that test? An excerpt showing all packages that match "postgresql" would be sufficient. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c4 --- Comment #4 from Sarah Kriesch <ada.lovelace@gmx.de> --- These repositories should be used after pubilshing: http://download.opensuse.org/ports/zsystems/factory/repo/oss/s390x/ together with http://download.opensuse.org/ports/zsystems/factory/repo/oss/noarch/ And as I can see in screenshot https://openqa.opensuse.org/tests/1734657#step/postgresql_server/2 it is installing: - postgresql11-11.11-3.2.s390x.rpm - postgresql13-13.2-2.3.s390x.rpm - postgresql-server-13-1.32.noarch.rpm - postgresql-13-1.32.noarch.rpm It seems, there is the main focus on postgresql13, but postgresql11 has found its way into the installation with postgresql-server. 5 days ago, there wasn't this issue. Our packages for s390x are coming from our s390x port: https://build.opensuse.org/project/show/openSUSE:Factory:zSystems -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c5 --- Comment #5 from Sarah Kriesch <ada.lovelace@gmx.de> --- The not public openQA mirror is: http://openqa.opensuse.org/assets/repo/Tumbleweed-oss-s390x-Snapshot20210514 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 Sarah Kriesch <ada.lovelace@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |azouhr@opensuse.org, | |ihno@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 Aaron Puchert <aaronpuchert@alice-dsl.net> changed: What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.llvm.org/show_ | |bug.cgi?id=50386 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c16 --- Comment #16 from Sarah Kriesch <ada.lovelace@gmx.de> --- @Aaron One hint: We can forward bug reports for IBM via the openSUSE Bugzilla, too. I believe that upstream bug reports require more time than the internal routing process. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c17 --- Comment #17 from Aaron Puchert <aaronpuchert@alice-dsl.net> --- (In reply to Sarah Kriesch from comment #16)
@Aaron One hint: We can forward bug reports for IBM via the openSUSE Bugzilla, too. I believe that upstream bug reports require more time than the internal routing process. I've got a quick reply from the SystemZ backend maintainer, who seems to be working for IBM. We just didn't get to an agreement so far. The question is whether the ABI should depend on the feature level, which I think it should not.
Do you happen to know about plans to switch SLE/openSUSE to a default architecture level of z13? That would be cutting the Gordian knot. Thanks anyway for offering help, if this doesn't work out we might come back to your offer. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c18 --- Comment #18 from OBSbugzilla Bot <bwiedemann+obsbugzillabot@suse.com> --- This is an autogenerated message for OBS integration: This bug (1185952) was mentioned in https://build.opensuse.org/request/show/894524 Factory / postgresql13 https://build.opensuse.org/request/show/894525 Factory / postgresql12 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c19 Reinhard Max <max@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |max@suse.com Assignee|max@suse.com |aaronpuchert@alice-dsl.net Summary|[Build 20210510] PostgreSQL |[Build 20210510] PostgreSQL |is not startable on s390x |12 and 13 fail to build | |with LLVM12 on s390x --- Comment #19 from Reinhard Max <max@suse.com> --- Workaround submitted, adjusting the Subject and reassigning to the maintainer of llvm. Feel free to assign back to me once llvm has been fixed for reverting the workaround. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c21 --- Comment #21 from Sarah Kriesch <ada.lovelace@gmx.de> --- The Bug with LLVM-jit has happened at Fedora, too. There was a discussion about this bug on the PostgreSQL mailinglist: https://www.postgresql.org/message-id/fc131116-baef-66a5-362c-b9e1a2b1ebec%4... That should be fixed in the next version (response by IBM). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c22 --- Comment #22 from Reinhard Max <max@suse.com> --- Thanks for the update, Sarah. Please clarify whether IBM was talking about the next PostgreSQL version or the next LLVM version that should fix this, as I don't see any IBM statement in the thread you linked to. BTW, I expect the next round of PostgreSQL minor releases in about a month, but AFAICS so far no fix for this issue has been committed. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c23 --- Comment #23 from Sarah Kriesch <ada.lovelace@gmx.de> --- The next Postgre version has been meant. Here is the "German" statement by Andreas Krebbel (Compiler Product Owner): "Das wird ein tempor�res Problem f�r einige Zeit sein, welches sich aber von selbst erledigen wird sobald alle Distros mit einem z13 default bauen. Dann werden auch die statischen Postgres Anteile die Z Hardware Vektor-ABI nutzen. Ich denke das Problem kann man jetzt nicht wirklich "fixen". Vermutlich kann man nur versuchen in dem JIT Part das VX feature auszumaskieren. Damit verliert man nat�rlich Performance in diesen Codeteilen, was aber hoffentlich nicht von Dauer ist." -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c24 --- Comment #24 from Aaron Puchert <aaronpuchert@alice-dsl.net> --- (In reply to Sarah Kriesch from comment #21)
The Bug with LLVM-jit has happened at Fedora, too. There was a discussion about this bug on the PostgreSQL mailinglist: https://www.postgresql.org/message-id/fc131116-baef-66a5-362c- b9e1a2b1ebec%40redhat.com Thanks for the link. And apparently there is even a patch that might work for now: https://www.postgresql.org/message-id/attachment/122331/0001-jit-Workaround-.... Reinhard, could you try this out?
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c25 --- Comment #25 from Reinhard Max <max@suse.com> --- It looks like upstream wasn't completely satisfied with it yet and hence hasn't commited it so far. Therefore I'd prefer to stick with our current workaround to keep using llvm/clang 11 for s390x. Or do you see an urgent need to go for llvm12 within the next weeks? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c26 Aaron Puchert <aaronpuchert@alice-dsl.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|aaronpuchert@alice-dsl.net |max@suse.com --- Comment #26 from Aaron Puchert <aaronpuchert@alice-dsl.net> --- (In reply to Reinhard Max from comment #25)
It looks like upstream wasn't completely satisfied with it yet and hence hasn't commited it so far. I was reading this as mostly stylistic concerns, but maybe I've overlooked something.
Or do you see an urgent need to go for llvm12 within the next weeks? It's not urgent, we'll have to leave llvm11 in the distribution for some time. Some packages are even on llvm{7,9,10}. But all the older LLVMs are out of maintenance, I'm basically just fixing the occasional build failure.
Hope you don't mind that I'm assigning this back to you. It is a bug in LLVM in my view, but without the zSystems backend maintainer agreeing to my concerns there is nothing I can do. So we'll have to patch this out in Postgres. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c27 --- Comment #27 from Reinhard Max <max@suse.com> --- OK. I'll wait for the next PostgreSQL minor version update and add the patch then, if upstream hasn't already integrated it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c35 --- Comment #35 from OBSbugzilla Bot <bwiedemann+obsbugzillabot@suse.com> --- This is an autogenerated message for OBS integration: This bug (1185952) was mentioned in https://build.opensuse.org/request/show/917538 Factory / postgresql10 https://build.opensuse.org/request/show/917540 Factory / postgresql11 https://build.opensuse.org/request/show/917541 Factory / postgresql12 https://build.opensuse.org/request/show/917542 Factory / postgresql13 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c38 --- Comment #38 from Sarah Kriesch <ada.lovelace@gmx.de> --- I am happy, that openQA is working again for s390x. But we have got a new failure for PostgreSQL11, because it can not be built successfully with LLVM. /usr/lib64/gcc/s390x-suse-linux/11/../../../../s390x-suse-linux/bin/ld: @GLIBCXX_3.4.11: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so /usr/lib64/gcc/s390x-suse-linux/11/../../../../s390x-suse-linux/bin/ld: /usr/lib64/libLLVM.so: error adding symbols: bad value collect2: error: ld returned 1 exit status make[2]: *** [../../../../src/Makefile.shlib:309: llvmjit.so] Error 1 Should I create a new bugreport for that? That is a result of this bug reference. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c39 Reinhard Max <max@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #39 from Reinhard Max <max@suse.com> --- I've seen this happen on all three versions of PostgreSQL that support LLVM (11, 12 and 13), but only on s390x and only in about 4 out of 5 build runs (within the short period over which I monitored it). I think this should be handled in a separate bug report, because the bug at hand got fixed and the reason for these new failures likely lies outside of the PostgreSQL packages. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c40 --- Comment #40 from Aaron Puchert <aaronpuchert@alice-dsl.net> --- (In reply to Reinhard Max from comment #39)
I've seen this happen on all three versions of PostgreSQL that support LLVM (11, 12 and 13), but only on s390x and only in about 4 out of 5 build runs (within the short period over which I monitored it).
Since it comes from ld (or collect2?) and that's an LTO link, I suppose it might be a race condition. I agree, it should be a separate bug. Maybe add Michael Matz for binutils and Martin Liska for GCC LTO. IBM tends to have weaker memory models, and that race might just not materialize with the stronger model of x86. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c62 --- Comment #62 from Aaron Puchert <aaronpuchert@alice-dsl.net> --- (In reply to Reinhard Max from comment #12)
The regression test failures with llvm12 are full of messages like this: [ 1322s] +ERROR: failed to JIT module: Added modules have incompatible data layouts: E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64 (module) vs E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64 (jit)
According to the change log [1], it seems that LLVM 16 should fix the underlying issue here:
The datalayout string now only depends on the target triple as expected.
So the workaround could probably be reverted for LLVM 16 and newer. I haven't submitted this into Factory yet, and it's probably not supported by PostgreSQL yet, but you could already make application of 0001-jit-Workaround-potential-datalayout-mismatch-on-s390.patch dependent on %{_llvm_sonum} < 16. [1] https://releases.llvm.org/16.0.0/docs/ReleaseNotes.html#changes-to-the-syste... -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com