[Bug 1179538] New: FAIL: gdb.arch/amd64-init-x87-values.exp: check_setting_mxcsr_before_enable: check new value of MXCSR is still in place
https://bugzilla.suse.com/show_bug.cgi?id=1179538 Bug ID: 1179538 Summary: FAIL: gdb.arch/amd64-init-x87-values.exp: check_setting_mxcsr_before_enable: check new value of MXCSR is still in place Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Development Assignee: screening-team-bugs@suse.de Reporter: tdevries@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- In more detail: ... (gdb) PASS: gdb.arch/amd64-init-x87-values.exp: check_setting_mxcsr_before_enable: step forward one instruction for mxcsr test p/x $mxcsr^M $1 = 0x1f80^M (gdb) FAIL: gdb.arch/amd64-init-x87-values.exp: check_setting_mxcsr_before_enable: check new value of MXCSR is still in place ... This is with kernel-obs-build-5.9.11-1.1. So the scenario is as follows: - we set mxcsr to a non-default value - then we stepi past a nop - we check that mxcsr still has the same value - it turns out, it doesn't: the register is back to default value It sounds alot like the fix for https://bugzilla.kernel.org/show_bug.cgi?id=207979 is active: ... commit 7ad816762f9bf89e940e618ea40c43138b479e10 Author: Petteri Aimonen <jpa@git.mail.kapsi.fi> Date: Tue Jun 16 11:12:57 2020 +0200 x86/fpu: Reset MXCSR to default in kernel_fpu_begin() ... However, the commit has been in place since v5.8, and I don't see this problem on a laptop with tumbleweed running kernel v5.9.10. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 Tom de Vries <tdevries@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bpetkov@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 https://bugzilla.suse.com/show_bug.cgi?id=1179538#c1 --- Comment #1 from Borislav Petkov <bpetkov@suse.com> --- Hmmm, strange. Looking at that test in gdb, it does only this, right? gdb_test_no_output "set \$mxcsr=0x9f80" "set a new value for MXCSR" gdb_test "stepi" "fwait" "step forward one instruction for mxcsr test" so basically I should be able to reproduce this in a VM, right? Do I need some special version of gdb or it doesn't matter? Also, can you dump the new MXCSR value in the test so that we can see what it sets it to? Also, you're running this on baremetal, right, not in a VM? Thx. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 https://bugzilla.suse.com/show_bug.cgi?id=1179538#c2 --- Comment #2 from Tom de Vries <tdevries@suse.com> --- (In reply to Borislav Petkov from comment #1)
Hmmm, strange.
Looking at that test in gdb, it does only this, right?
gdb_test_no_output "set \$mxcsr=0x9f80" "set a new value for MXCSR" gdb_test "stepi" "fwait" "step forward one instruction for mxcsr test"
so basically I should be able to reproduce this in a VM, right?
I don't understand yet the precise conditions under which this triggers.
Do I need some special version of gdb or it doesn't matter?
This is with the gdb currently in devel:gcc, it might also reproduce with others.
Also, can you dump the new MXCSR value in the test so that we can see what it sets it to?
Added a print after the set in command below. To reproduce on command line (this is a session on openSUSE Leap 15.2 with kernel version 5.3.18, so $3 is as expected): ... $ gcc -fno-stack-protector -static -nostartfiles -g \ src/gdb/testsuite/gdb.arch/amd64-init-x87-values.S $ gdb -batch a.out \ -ex start \ -ex "p /x \$mxcsr = 0x9f80" \ -ex "p /x \$mxcsr" \ -ex stepi \ -ex "p /x \$mxcsr" Temporary breakpoint 1 at 0x4000d5: file src/gdb/testsuite/gdb.arch/amd64-init-x87-values.S, line 27. Temporary breakpoint 1, main () at src/gdb/testsuite/gdb.arch/amd64-init-x87-values.S:27 27 nop $1 = 0x9f80 $2 = 0x9f80 28 fwait $3 = 0x9f80 ...
Also, you're running this on baremetal, right, not in a VM?
It's from an obs log, so kvm. I haven't been able to reproduce this outside yet. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 https://bugzilla.suse.com/show_bug.cgi?id=1179538#c3 --- Comment #3 from Borislav Petkov <bpetkov@suse.com> --- (In reply to Tom de Vries from comment #2)
It's from an obs log, so kvm. I haven't been able to reproduce this outside yet.
Ok, that is an important point. See the upstream bugzilla you quoted, from comment #21 onwards. The reporter there triggered it in kvm *only* and not on baremetal. So it must be something virt-related. I'll try to run the test case in a guest here to see what I can trigger. Thx. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 https://bugzilla.suse.com/show_bug.cgi?id=1179538#c4 --- Comment #4 from Borislav Petkov <bpetkov@suse.com> --- Kernel 5.10 on the host and guest works fine: + gdb -batch a.out -ex start -ex 'p /x $mxcsr = 0x9f80' -ex 'p /x $mxcsr' -ex stepi -ex 'p /x $mxcsr' Temporary breakpoint 1 at 0x401001: file amd64-init-x87-values.S, line 27. Temporary breakpoint 1, main () at amd64-init-x87-values.S:27 27 nop $1 = 0x9f80 $2 = 0x9f80 28 fwait $3 = 0x9f80 I *think* this will trigger with the OBS host kernel. Considering how the upstream reporter stopped seeing this after a while, this looks like it "got fixed" in kvm/qemu recently. Just how it looks from here though - I have no hard evidence yet. Thx. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 Chenzi Cao <chcao@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Development |Kernel Assignee|screening-team-bugs@suse.de |kernel-bugs@opensuse.org -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 https://bugzilla.suse.com/show_bug.cgi?id=1179538#c5 --- Comment #5 from Borislav Petkov <bpetkov@suse.com> --- Ok, 15-SP2 host, leap 15.2 guest: $ ./run-gdb.sh Temporary breakpoint 1 at 0x401001: file amd64-init-x87-values.S, line 27. Temporary breakpoint 1, main () at amd64-init-x87-values.S:27 27 nop $1 = 0x9f80 $2 = 0x9f80 28 fwait $3 = 0x9f80 guest kernel is leap152 5.3.18-lp152.57-default, host kernel is 5.3.18-24.43-default and qemu is QEMU emulator version 4.2.0 (SUSE Linux Enterprise 15). I guess I need to try kernel-obs-build-5.9.11-1.1 now. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 https://bugzilla.suse.com/show_bug.cgi?id=1179538#c6 --- Comment #6 from Borislav Petkov <bpetkov@suse.com> --- Ok I was able to find 5.9.12-1-default for the guest, 10K iterations of the test all give: 10000 $3 = 0x9f80 So if you have a reproducer I'm all ears... Thx. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 https://bugzilla.suse.com/show_bug.cgi?id=1179538#c7 --- Comment #7 from Tom de Vries <tdevries@suse.com> --- All I have is evidence of this FAIL in OBS: ... $ grep "FAIL: gdb.arch/amd64-init-x87-values.exp.*MXCSR" binaries-testsuite.*/gdb-testresults/*.sum binaries-testsuite.openSUSE_Leap_15.2.x86_64/gdb-testresults/gdb-x86_64-suse-linux-m64.-fno-PIE.-no-pie.sum:FAIL: gdb.arch/amd64-init-x87-values.exp: check_setting_mxcsr_before_enable: check new value of MXCSR is still in place binaries-testsuite.openSUSE_Leap_15.2.x86_64/gdb-testresults/gdb-x86_64-suse-linux-m64.sum:FAIL: gdb.arch/amd64-init-x87-values.exp: check_setting_mxcsr_before_enable: check new value of MXCSR is still in place ... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 https://bugzilla.suse.com/show_bug.cgi?id=1179538#c8 Miroslav Bene�� <mbenes@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mbenes@suse.com --- Comment #8 from Miroslav Bene�� <mbenes@suse.com> --- Has there been any development since then? Tom, is it still failing for you? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 https://bugzilla.suse.com/show_bug.cgi?id=1179538#c9 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |jslaby@suse.com Resolution|--- |NORESPONSE --- Comment #9 from Jiri Slaby <jslaby@suse.com> --- Hopefully fixed itself then? The same as upstream bug. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179538 https://bugzilla.suse.com/show_bug.cgi?id=1179538#c10 --- Comment #10 from Tom de Vries <tdevries@suse.com> --- (In reply to Jiri Slaby from comment #9)
Hopefully fixed itself then? The same as upstream bug.
I've looked in the OBS logs, and it's not showing. So it may have been fixed. OTOH, it also may just not reproduce due to peculiarities of the machine OBS runs it on, there's no way of telling. My mention of the commit fixing the upstream kernel bug was to suggest a possible culprit, not to suggest that the problem was fixed by the commit. Anyway, there's not much we can do or conclude until we find a reproducer. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com