[Bug 1193139] New: [x86_64, 32-bit] mpx support broken
https://bugzilla.suse.com/show_bug.cgi?id=1193139
Bug ID: 1193139
Summary: [x86_64, 32-bit] mpx support broken
Classification: openSUSE
Product: openSUSE Distribution
Version: Leap 15.3
Hardware: x86
OS: Other
Status: NEW
Severity: Normal
Priority: P5 - None
Component: Kernel
Assignee: kernel-bugs@opensuse.org
Reporter: tdevries@suse.com
QA Contact: qa-bugs@suse.de
Found By: ---
Blocker: ---
Consider test-case mpx-out-of-bounds.c (
https://01.org/blogs/2016/intel-mpx-linux ):
...
$ cat mpx-out-of-bounds.c
#include
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c1
--- Comment #1 from Borislav Petkov
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c2
--- Comment #2 from Tom de Vries
Who says that "Saw a #BR!" line? I don't see it in the kernel so maybe glibc?
gcc, libmpx/mpxrt/mpxrt.c: ... $ find libmpx/ -type f | xargs grep "Saw" libmpx/mpxrt/mpxrt.c: __mpxrt_write (VERB_BR, "Saw a #BR! status "); ... -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c3
--- Comment #3 from Borislav Petkov
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c4
--- Comment #4 from Tom de Vries
$ gcc -Wall -o mpx-out-of-bounds -mmpx -fcheck-pointer-bounds mpx-out-of-bounds.c gcc: warning: switch ���-mmpx��� is no longer supported gcc: warning: switch ���-fcheck-pointer-bounds��� is no longer supported
That's gcc 10.
And now that I think about it, I think we did kill MPX altogether:
https://lore.kernel.org/lkml/tip- eb012ef3b4e331ae479dd7cd9378041d9b7f851c@git.kernel.org/
so why do you even bother with it?
According to https://en.wikipedia.org/wiki/Intel_MPX , support was removed in: - glibc 2.35. - gcc 9.1 - kernel 5.6 On Leap 15.3 we have: - glibc 2.31 - gcc 7.5.0 - kernel 5.3.18 So there's nothing to suggest that mpx is not supported on openSUSE Leap 15.3. I bother because there are gdb test-cases failing. I don't care one way or the other, but either we support it or we don't support it, and all info above indicates we do support it. If we don't support it, all appearance of supporting it should be removed. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c5
--- Comment #5 from Borislav Petkov
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c6
--- Comment #6 from Tom de Vries
So I just installed a leap 15.3 guest:
[root@localhost: ~/mpx> gcc -m32 -Wall -o mpx-out-of-bounds mpx-out-of-bounds.c -mmpx -fcheck-pointer-bounds [root@localhost: ~/mpx> ./mpx-out-of-bounds 10 dog[0]: 'd' dog[1]: 'o' dog[2]: 'g' dog[3]: '' dog[4]: 's' dog[5]: 'e' dog[6]: 'c' dog[7]: 'r' dog[8]: '3' dog[9]: 't' [root@localhost: ~/mpx> uname -a Linux localhost 5.3.18-59.34-default #1 SMP Thu Nov 11 12:18:45 UTC 2021 (a2a53aa) x86_64 x86_64 x86_64 GNU/Linux
Or do I need a real hw machine that supports MPX?
You'll need a machine with mpx in /proc/cpuinfo flags. I don't know whether that needs to be real hw, or can also be a VM. I reproduced it on real hw, and the failure is all over OBS as well, so I imagine a VM should also work. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c7
--- Comment #7 from Borislav Petkov
You'll need a machine with mpx in /proc/cpuinfo flags. I don't know whether that needs to be real hw, or can also be a VM. I reproduced it on real hw, and the failure is all over OBS as well, so I imagine a VM should also work.
Ok, I got a machine which triggers and this is what I think considering your statement from an earlier comment:
I don't care one way or the other, but either we support it or we don't support it, and all info above indicates we do support it.
If we don't support it, all appearance of supporting it should be removed.
Even if this turns out to be a kernel bug, we'd need to fix it. Even if we fix it properly, the fix will have to be carried by us *solely* because Intel has abandoned that technology and it is not even present in newer CPUs and newer kernels. And Intel abandoning this would mean that it is highly unlikely that anyone would be using it - especially 32-bit - considering that there won't be any future hw support. So I think the easiest thing to do is to disable it for 32-bit and keep the 64-bit support. It'll die with 15SP3 as SP4 doesn't have it anymore and it'll be that. Ok? -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c8
--- Comment #8 from Tom de Vries
(In reply to Tom de Vries from comment #6)
You'll need a machine with mpx in /proc/cpuinfo flags. I don't know whether that needs to be real hw, or can also be a VM. I reproduced it on real hw, and the failure is all over OBS as well, so I imagine a VM should also work.
Ok, I got a machine which triggers and this is what I think considering your statement from an earlier comment:
I don't care one way or the other, but either we support it or we don't support it, and all info above indicates we do support it.
If we don't support it, all appearance of supporting it should be removed.
Even if this turns out to be a kernel bug, we'd need to fix it. Even if we fix it properly, the fix will have to be carried by us *solely* because Intel has abandoned that technology and it is not even present in newer CPUs and newer kernels.
And Intel abandoning this would mean that it is highly unlikely that anyone would be using it - especially 32-bit - considering that there won't be any future hw support.
So I think the easiest thing to do is to disable it for 32-bit and keep the 64-bit support.
It'll die with 15SP3 as SP4 doesn't have it anymore and it'll be that.
Ok?
Sounds fine by me. For my understanding, how do you propose to disable it for 32-bit only? -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c9
Michal Hocko
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c10
--- Comment #10 from Tom de Vries
Is this particular failure a regression or something that has only really worked by a chance? Maybe we just haven't run this particular test before?
We did run the tests before, and it started failing only recently, so in that sense it's a regression. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c11
--- Comment #11 from Michal Hocko
(In reply to Michal Hocko from comment #9)
Is this particular failure a regression or something that has only really worked by a chance? Maybe we just haven't run this particular test before?
We did run the tests before, and it started failing only recently, so in that sense it's a regression.
Were those tests compiling for 32b as well? Or is this part possibly new? If the same tests were run previously can we find out which was the last good kernel? -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c12
--- Comment #12 from Tom de Vries
(In reply to Tom de Vries from comment #10)
(In reply to Michal Hocko from comment #9)
Is this particular failure a regression or something that has only really worked by a chance? Maybe we just haven't run this particular test before?
We did run the tests before, and it started failing only recently, so in that sense it's a regression.
Were those tests compiling for 32b as well?
Yes.
Or is this part possibly new?
No. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c13
--- Comment #13 from Borislav Petkov
We did run the tests before, and it started failing only recently, so in that sense it's a regression.
Can we quantify "only recently" better? Maybe I can bisect the 15SP3 kernel and see what has caused this... -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c14
--- Comment #14 from Tom de Vries
(In reply to Tom de Vries from comment #10)
We did run the tests before, and it started failing only recently, so in that sense it's a regression.
Can we quantify "only recently" better? Maybe I can bisect the 15SP3 kernel and see what has caused this...
It's a good question, unfortunately I don't have a good answer yet.
On OBS, I saw this problem pop up in devel:gcc/gdb in the last two weeks or so.
Of course there's no guarantee that all the machines that were used for
testing before did support mpx, but things suddenly started failing over
several configurations, so it certainly has the appearance of regression.
I can reproduce the problem on both my laptops with openSUSE Leap 15.3
installed.
I tried as hunch to install the kernel 5.3.18-59.10 (which AFAIU is 15.3 GA)
and still reproduced the problem.
I'm now trying to reinstall leap 15.2 on my insert-distro-here laptop, but am
running into installation problems which I'm trying to solve.
In the past year, I have run leap15.2 on my laptop, I only switched to 15.3
recently. I have fixed problems related to mpx m32 in the test-suite, and I
did not see these failures then.
Last fix there:
...
commit 0c4e2c6c88e9e67ca39f4f2e3bdb205b4f4a1e6c
Author: Tom de Vries
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c15
--- Comment #15 from Tom de Vries
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c16
--- Comment #16 from Tom de Vries
data point: - fresh leap 15.2 install from dvd iso image. - enabled main repository only, then enabled network, so no updates - installed gcc, gcc-32bit packages - ran reproducer script, problem did not reproduce
forgot to mention: with kernel version 5.3.18-lp152.19-default Another data point: - enabled main update repository - did sudo zypper update kernel-default, which installed 5.3.18-lp152.106-default - rebooted - ran reproducer script, problem did not reproduce -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c17
--- Comment #17 from Tom de Vries
(In reply to Tom de Vries from comment #15)
data point: - fresh leap 15.2 install from dvd iso image. - enabled main repository only, then enabled network, so no updates - installed gcc, gcc-32bit packages - ran reproducer script, problem did not reproduce
forgot to mention: with kernel version 5.3.18-lp152.19-default
Another data point: - enabled main update repository - did sudo zypper update kernel-default, which installed 5.3.18-lp152.106-default - rebooted - ran reproducer script, problem did not reproduce
another data point: - did sudo zypper update gcc gcc-32bit - ran reproducer script, problem did not reproduce. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c18
--- Comment #18 from Michal Hocko
I can reproduce the problem on both my laptops with openSUSE Leap 15.3 installed.
I tried as hunch to install the kernel 5.3.18-59.10 (which AFAIU is 15.3 GA) and still reproduced the problem.
OK, then this would imply that this not really a regression within 15.3 scope and the code has been broken since the initial release. You are saying that 15.2 is OK. Both of them are 5.3 based kernels so it is quite likely that some of our patches there could have caused that. Bisection between 15.2 and 15.3 will be quite messy and I am not sure this is worth an extra time spent on this. Considering that the feature has been reverted upstream I think we should be fine by just disabling it. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c19
--- Comment #19 from Tom de Vries
(In reply to Tom de Vries from comment #16)
(In reply to Tom de Vries from comment #15)
data point: - fresh leap 15.2 install from dvd iso image. - enabled main repository only, then enabled network, so no updates - installed gcc, gcc-32bit packages - ran reproducer script, problem did not reproduce
forgot to mention: with kernel version 5.3.18-lp152.19-default
Another data point: - enabled main update repository - did sudo zypper update kernel-default, which installed 5.3.18-lp152.106-default - rebooted - ran reproducer script, problem did not reproduce
another data point: - did sudo zypper update gcc gcc-32bit - ran reproducer script, problem did not reproduce.
another data point: - did sudo zypper update - ran reproducer script, problem did not reproduce. - rebooted - ran reproducer script, problem did not reproduce. So, it seems 15.2 is ok. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c20
--- Comment #20 from Tom de Vries
I occasionally ran test with -m32 and did not meet this problem till I noticed the problem on OBS, after which I reproduced on my laptop which at this point was already switched to leap 15.2.
Sorry, typo, that should be leap 15.3. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c21
--- Comment #21 from Tom de Vries
So, it seems 15.2 is ok.
Final data point: - fresh leap 15.3 install from dvd iso image. - enabled main repository only, then enabled network, so no updates - installed gcc, gcc-32bit packages - ran reproducer script, problem did reproduce So confirmed, this is a 15.2 fully updated to 15.3 GA regression. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c22
--- Comment #22 from Borislav Petkov
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c23
--- Comment #23 from Tom de Vries
Good to know, I guess we can use that if we ever need to bisect.
In any case, here's a fixed kernel to test:
https://download.suse.de/ibs/home:/bpetkov:/15sp3-mpx/standard/
With it, it says here:
$./mpx-out-of-bounds 10 No MPX support dog[0]: 'd' dog[1]: 'o' dog[2]: 'g' dog[3]: '' dog[4]: 's' dog[5]: 'e' dog[6]: 'c' dog[7]: 'r' dog[8]: '3' dog[9]: 't'
I've installed the kernel and reproduced the behaviour. Also confirmed that the running example still works with -m64. Looks good from user perspective, AFAICT. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c24
Borislav Petkov
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c31
--- Comment #31 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c32
--- Comment #32 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c33
--- Comment #33 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c34
--- Comment #34 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c35
--- Comment #35 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c36
--- Comment #36 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1193139
https://bugzilla.suse.com/show_bug.cgi?id=1193139#c37
--- Comment #37 from Swamp Workflow Management
participants (1)
-
bugzilla_noreply@suse.com