[Bug 1185952] New: [Build 20210510] PostgreSQL is not startable on s390x
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952 Bug ID: 1185952 Summary: [Build 20210510] PostgreSQL is not startable on s390x Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: S/390-64 URL: https://openqa.opensuse.org/tests/1734657/modules/post gresql_server/steps/6 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Other Assignee: screening-team-bugs@suse.de Reporter: ada.lovelace@gmx.de QA Contact: qa-bugs@suse.de Found By: openQA Blocker: Yes ## Observation openQA test in scenario opensuse-Tumbleweed-DVD-s390x-textmode@s390x-zVM-vswitch-l2 fails in [postgresql_server](https://openqa.opensuse.org/tests/1734657/modules/postgresql_server/steps/6) All packages can be installed. The problem is that the PostgreSQL server can not be started. The journal log is saying: -- Logs begin at Mon 2021-05-10 22:41:06 EDT, end at Mon 2021-05-10 23:23:02 EDT. -- May 10 23:15:15 susetest systemd[1]: Starting PostgreSQL database server... May 10 23:15:15 susetest systemd[1]: postgresql.service: Control process exited, code=exited, status=1/FAILURE May 10 23:15:15 susetest systemd[1]: postgresql.service: Failed with result 'exit-code'. May 10 23:15:15 susetest systemd[1]: Failed to start PostgreSQL database server. May 10 23:15:15 susetest postgresql-script[12832]: Cannot find an active PostgreSQL server binary. Please install one of the PostgreSQL May 10 23:15:15 susetest postgresql-script[12832]: server packages or activate an already installed version using update-alternatives. ## Test suite description Maintainer: QE Yast Installation in textmode and selecting the textmode "desktop" during installation. ## Reproducible Fails since (at least) Build [20210510](https://openqa.opensuse.org/tests/1734657) (current job) ## Expected result Last good: [20210507](https://openqa.opensuse.org/tests/1731697) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=s390x&distri=opensuse&flavor=DVD&machine=s390x-zVM-vswitch-l2&test=textmode&version=Tumbleweed) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
Sarah Kriesch
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c1
--- Comment #1 from Reinhard Max
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c2
--- Comment #2 from Sarah Kriesch
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c3
--- Comment #3 from Reinhard Max
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c4
--- Comment #4 from Sarah Kriesch
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c5
--- Comment #5 from Sarah Kriesch
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
Sarah Kriesch
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
Aaron Puchert
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c16
--- Comment #16 from Sarah Kriesch
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c17
--- Comment #17 from Aaron Puchert
@Aaron One hint: We can forward bug reports for IBM via the openSUSE Bugzilla, too. I believe that upstream bug reports require more time than the internal routing process. I've got a quick reply from the SystemZ backend maintainer, who seems to be working for IBM. We just didn't get to an agreement so far. The question is whether the ABI should depend on the feature level, which I think it should not.
Do you happen to know about plans to switch SLE/openSUSE to a default architecture level of z13? That would be cutting the Gordian knot. Thanks anyway for offering help, if this doesn't work out we might come back to your offer. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c18
--- Comment #18 from OBSbugzilla Bot
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c19
Reinhard Max
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c21
--- Comment #21 from Sarah Kriesch
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c22
--- Comment #22 from Reinhard Max
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c23
--- Comment #23 from Sarah Kriesch
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c24
--- Comment #24 from Aaron Puchert
The Bug with LLVM-jit has happened at Fedora, too. There was a discussion about this bug on the PostgreSQL mailinglist: https://www.postgresql.org/message-id/fc131116-baef-66a5-362c- b9e1a2b1ebec%40redhat.com Thanks for the link. And apparently there is even a patch that might work for now: https://www.postgresql.org/message-id/attachment/122331/0001-jit-Workaround-.... Reinhard, could you try this out?
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c25
--- Comment #25 from Reinhard Max
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c26
Aaron Puchert
It looks like upstream wasn't completely satisfied with it yet and hence hasn't commited it so far. I was reading this as mostly stylistic concerns, but maybe I've overlooked something.
Or do you see an urgent need to go for llvm12 within the next weeks? It's not urgent, we'll have to leave llvm11 in the distribution for some time. Some packages are even on llvm{7,9,10}. But all the older LLVMs are out of maintenance, I'm basically just fixing the occasional build failure.
Hope you don't mind that I'm assigning this back to you. It is a bug in LLVM in my view, but without the zSystems backend maintainer agreeing to my concerns there is nothing I can do. So we'll have to patch this out in Postgres. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c27
--- Comment #27 from Reinhard Max
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c35
--- Comment #35 from OBSbugzilla Bot
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c38
--- Comment #38 from Sarah Kriesch
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c39
Reinhard Max
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c40
--- Comment #40 from Aaron Puchert
I've seen this happen on all three versions of PostgreSQL that support LLVM (11, 12 and 13), but only on s390x and only in about 4 out of 5 build runs (within the short period over which I monitored it).
Since it comes from ld (or collect2?) and that's an LTO link, I suppose it might be a race condition. I agree, it should be a separate bug. Maybe add Michael Matz for binutils and Martin Liska for GCC LTO. IBM tends to have weaker memory models, and that race might just not materialize with the stronger model of x86. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952
http://bugzilla.opensuse.org/show_bug.cgi?id=1185952#c62
--- Comment #62 from Aaron Puchert
The regression test failures with llvm12 are full of messages like this: [ 1322s] +ERROR: failed to JIT module: Added modules have incompatible data layouts: E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64 (module) vs E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64 (jit)
According to the change log [1], it seems that LLVM 16 should fix the underlying issue here:
The datalayout string now only depends on the target triple as expected.
So the workaround could probably be reverted for LLVM 16 and newer. I haven't submitted this into Factory yet, and it's probably not supported by PostgreSQL yet, but you could already make application of 0001-jit-Workaround-potential-datalayout-mismatch-on-s390.patch dependent on %{_llvm_sonum} < 16. [1] https://releases.llvm.org/16.0.0/docs/ReleaseNotes.html#changes-to-the-syste... -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com