[Bug 1204267] New: Plasmashell crashes on armv7
http://bugzilla.opensuse.org/show_bug.cgi?id=1204267 Bug ID: 1204267 Summary: Plasmashell crashes on armv7 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: armv7 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: KDE Applications Assignee: opensuse-kde-bugs@opensuse.org Reporter: guillaume.gardet@arm.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 862142 --> http://bugzilla.opensuse.org/attachment.cgi?id=862142&action=edit journal.log Plasmashell crashes on armv7 since snapshot 20221006. Oct 13 04:54:05 localhost.localdomain plasmashell[2046]: QFont::setPointSizeF: Point size <= 0 (0.000000), must be greater than 0 Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: LLVM ERROR: Cannot select: 0x3003d38: v4i32 = ARMISD::VCMPZ 0x2f64940, Constant:i32<2> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2f64940: v4i32,ch = ARMISD::VLD1DUP<(load (s32) from %ir.326)> 0x2fcdea8, 0x2fb0848, Constant:i32<4> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2fb0848: i32 = add 0x3006648, Constant:i32<64> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x3006648: i32,ch = CopyFromReg 0x2e4bf5c, Register:i32 %35 Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2fed080: i32 = Register %35 Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2e65f08: i32 = Constant<64> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2fd93c0: i32 = Constant<4> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2fd9690: i32 = Constant<2> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: In function: fs_variant_partial Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: KCrash: Application 'plasmashell' crashing... Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: KCrash: Attempting to start /usr/libexec/drkonqi Oct 13 04:54:06 localhost.localdomain plasmashell[2077]: libEGL warning: DRI2: failed to authenticate Oct 13 04:54:06 localhost.localdomain kded5[1600]: Service "org.kde.StatusNotifierHost-2046" unregistered Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: Unable to start Dr. Konqi Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: Re-raising signal for core dump handling. Oct 13 04:54:06 localhost.localdomain systemd[1284]: plasma-plasmashell.service: Main process exited, code=dumped, status=6/ABRT -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1204267 http://bugzilla.opensuse.org/show_bug.cgi?id=1204267#c3 Christophe Giboudeaux <christophe@krop.fr> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |aaronpuchert@alice-dsl.net, | |christophe@krop.fr --- Comment #3 from Christophe Giboudeaux <christophe@krop.fr> --- The issue can be reproduced with one pyside test on armv7l: [ 3286s] 460/478 Test #461: QtDataVisualization_datavisualization_test ......Subprocess aborted***Exception: 0.82 sec [ 3286s] .LLVM ERROR: Cannot select: 0x1e6a2c0: v4i32 = ARMISD::VCMPZ 0x1d68a08, Constant:i32<2> [ 3286s] 0x1d68a08: v4i32,ch = ARMISD::VLD1DUP<(load (s32) from %ir.235)> 0x11ebd5c, 0x1d68660:1, Constant:i32<4> [ 3286s] 0x1d68660: i32,i32,ch = load<(load (s32) from %ir.232, align 8), <post-inc>> 0x11ebd5c, 0x1d52518, Constant:i32<64> [ 3286s] 0x1d52518: i32,ch = CopyFromReg 0x11ebd5c, Register:i32 %30 [ 3286s] 0x1d68858: i32 = Register %30 [ 3286s] 0x1d67100: i32 = Constant<64> [ 3286s] 0x1afd1a8: i32 = Constant<4> [ 3286s] 0x1d66ec0: i32 = Constant<2> [ 3286s] In function: fs_variant_partial -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1204267 http://bugzilla.opensuse.org/show_bug.cgi?id=1204267#c4 --- Comment #4 from Aaron Puchert <aaronpuchert@alice-dsl.net> --- In my understanding, "Cannot select" is always an LLVM bug, specifically in the backend. Early stages of the backend should "legalize" data types and instructions, sending to instruction selection only what the target supports. So I can have a look, but it would be appreciated if someone could extract the IR that Mesa sends to LLVM. Otherwise I'll have to reverse-engineer a reproducer. Nevertheless, some initial remarks: ARMISD::VCMPZ is a "Vector compare to zero." [1] It should correspond to "vcmpe" in assembly [2]. The first argument being a v4i32 is slightly suspicious. I would have expected a v4f32, but since they live in the same registers maybe the backend doesn't care. The second is a Constant:i32<2> = ARMCC::CondCodes::HS, corresponding to conditional execution only if the carry flag is set, if I understand this correctly. [3,4] Inside we have ARMISD::VLD1DUP, which is a "Vector load N-element structure to all lanes" (same file as [1], different line), and seems to correspond to "vld1.N" in assembly. [5] The Constant:i32<4> could be an alignment, but I'm not sure. [1] https://github.com/llvm/llvm-project/blob/llvmorg-15.0.2/llvm/lib/Target/ARM... [2] https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architec... [3] https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architec... [4] https://github.com/llvm/llvm-project/blob/llvmorg-15.0.2/llvm/lib/Target/ARM... [5] https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architec... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1204267 http://bugzilla.opensuse.org/show_bug.cgi?id=1204267#c6 --- Comment #6 from Aaron Puchert <aaronpuchert@alice-dsl.net> --- Thanks, that should help. This isn't my area of expertise, but at least we can use this to file a bug upstream. (In reply to Aaron Puchert from comment #4)
Nevertheless, some initial remarks: ARMISD::VCMPZ is a "Vector compare to zero." [1] It should correspond to "vcmpe" in assembly [2]. The first argument being a v4i32 is slightly suspicious. I would have expected a v4f32, but since they live in the same registers maybe the backend doesn't care. The second is a Constant:i32<2> = ARMCC::CondCodes::HS, corresponding to conditional execution only if the carry flag is set, if I understand this correctly. [3,4]
Seems I was misreading that, the condition code is for the comparison itself. For floating-point ARMCC::CondCodes::HS means ">, ==, or unordered", so we're doing a !(... < 0.0f) comparison. Likely corresponds to one of the fcmp ..., zeroinitializer in the IR. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1204267 http://bugzilla.opensuse.org/show_bug.cgi?id=1204267#c8 Aaron Puchert <aaronpuchert@alice-dsl.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|gfx-bugs@suse.de |aaronpuchert@alice-dsl.net --- Comment #8 from Aaron Puchert <aaronpuchert@alice-dsl.net> --- Since I'm not sure what the precise target machine is, I've used flags similar to how we build LLVM itself (see the specfile): llc -march=arm --float-abi=hard -mattr=+armv7-a,+vfp3d16 This reproduces the crash, just with a slightly different message: LLVM ERROR: Cannot select: t933: v4i32 = ARMISD::VCMPZ t1307, Constant:i32<2> t1307: v4i32,ch = ARMISD::VLD1DUP<(load (s32) from %ir.584)> t0, t1429:1, Constant:i32<4> t1429: i32,i32,ch = load<(load (s32) from %ir."&context.constants_ptr[]5618", align 8), <post-inc>> t0, t2, Constant:i32<64> t2: i32,ch = CopyFromReg t0, Register:i32 %45 t1: i32 = Register %45 t212: i32 = Constant<64> t49: i32 = Constant<4> t28: i32 = Constant<2> What's different is the added IR names, but they're not immediately helpful: there is a "&context.constants_ptr[]56" in the source, maybe there was disambiguation. The crash is reproducible on the current main branch, so it's still not fixed. With bugpoint --run-llc <input-file> --tool-args <options as above> we can reduce it to this: define void @fs_variant_partial() { entry: %output = alloca <4 x float>, align 16 br label %loop_begin loop_begin: ; preds = %skip, %entry br i1 undef, label %skip, label %0 0: ; preds = %loop_begin %1 = icmp uge <4 x i32> zeroinitializer, undef %2 = sext <4 x i1> %1 to <4 x i32> %3 = load i32, i32* undef, align 4 %4 = insertelement <4 x i32> undef, i32 %3, i32 3 %5 = trunc <4 x i32> %2 to <4 x i1> %6 = select <4 x i1> %5, <4 x i32> zeroinitializer, <4 x i32> %4 %7 = insertvalue [4 x <4 x i32>] undef, <4 x i32> %6, 0 %8 = insertvalue [4 x <4 x i32>] %7, <4 x i32> undef, 1 %9 = insertvalue [4 x <4 x i32>] %8, <4 x i32> undef, 2 %10 = insertvalue [4 x <4 x i32>] %9, <4 x i32> undef, 3 %11 = extractvalue [4 x <4 x i32>] %10, 0 %12 = bitcast <4 x i32> %11 to <4 x float> %13 = fmul <4 x float> zeroinitializer, %12 %14 = fadd <4 x float> %13, zeroinitializer %15 = fadd <4 x float> %14, zeroinitializer %16 = bitcast <4 x float> %15 to <4 x i32> %17 = insertvalue [4 x <4 x i32>] undef, <4 x i32> %16, 0 %18 = insertvalue [4 x <4 x i32>] %17, <4 x i32> undef, 1 %19 = insertvalue [4 x <4 x i32>] %18, <4 x i32> undef, 2 %20 = insertvalue [4 x <4 x i32>] %19, <4 x i32> undef, 3 %21 = extractvalue [4 x <4 x i32>] %20, 0 %22 = bitcast <4 x i32> %21 to <4 x float> store <4 x float> %22, <4 x float>* %output, align 16 br label %skip skip: ; preds = %0, %loop_begin br label %loop_begin } Crash is slightly different now: LLVM ERROR: Cannot select: t48: v4i32 = ARMISD::VCMPZ undef:v4i32, Constant:i32<2> t3: v4i32 = undef t47: i32 = Constant<2> This obviously corresponds to the %1 = icmp uge <4 x i32> zeroinitializer, undef With that knowledge we can reduce further: define <4 x i32> @fs_variant_partial() { %1 = icmp uge <4 x i32> zeroinitializer, undef %2 = sext <4 x i1> %1 to <4 x i32> ret <4 x i32> %2 } or define <4 x i32> @fs_variant_partial(<4 x i32> %0) { %2 = icmp uge <4 x i32> zeroinitializer, %0 %3 = sext <4 x i1> %2 to <4 x i32> ret <4 x i32> %3 } I'll see if I can spot where we're missing something, but likely I'll just file a bug and let the ARM people figure it where this should be fixed. From the looks of it we're simply not able to lower "icmp uge <4 x i32> zeroinitializer, ...", and the nested instructions have nothing to do with it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1204267 Aaron Puchert <aaronpuchert@alice-dsl.net> changed: What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://github.com/llvm/llv | |m-project/issues/58514 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1204267 http://bugzilla.opensuse.org/show_bug.cgi?id=1204267#c10 --- Comment #10 from Aaron Puchert <aaronpuchert@alice-dsl.net> --- (In reply to Fabian Vogt from comment #9)
So the missing "instcombine" pass causes the "Cannot select" error and the pass is missing because Mesa passes an invalid list of passes to LLVMRunPasses and ignores the error.
Would it be possible to improve error handling here? At least some tracing would be nice. From your analysis it looks like this might affect more platforms and not just armv7, and we wouldn't have noticed anything were it not for the backend bug. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1204267 http://bugzilla.opensuse.org/show_bug.cgi?id=1204267#c12 --- Comment #12 from OBSbugzilla Bot <bwiedemann+obsbugzillabot@suse.com> --- This is an autogenerated message for OBS integration: This bug (1204267) was mentioned in https://build.opensuse.org/request/show/1031948 Factory / llvm15 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1204267 http://bugzilla.opensuse.org/show_bug.cgi?id=1204267#c13 --- Comment #13 from Aaron Puchert <aaronpuchert@alice-dsl.net> --- (In reply to OBSbugzilla Bot from comment #12)
This is an autogenerated message for OBS integration: This bug (1204267) was mentioned in https://build.opensuse.org/request/show/1031948 Factory / llvm15
This made it into snapshot 20221029 released today, and it fixes the compilation failure for the reproducer from attachment 862276. Factory:ARM hasn't released a new snapshot yet. Feel free to close once you're able to confirm that this fixes the original issue. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1204267 http://bugzilla.opensuse.org/show_bug.cgi?id=1204267#c15 Guillaume GARDET <guillaume.gardet@arm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED --- Comment #15 from Guillaume GARDET <guillaume.gardet@arm.com> --- openQA test passes. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com