[Bug 1224402] New: /usr/src/kernel-modules/nvidia-390.157-default/nvidia/nv-procfs.o: warning: objtool: nv_register_procfs() falls through to next function nv_unregister_procfs()

https://bugzilla.suse.com/show_bug.cgi?id=1224402 Bug ID: 1224402 Summary: /usr/src/kernel-modules/nvidia-390.157-default/nvidia/ nv-procfs.o: warning: objtool: nv_register_procfs() falls through to next function nv_unregister_procfs() Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.5 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: msuchanek@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Is this a bug in objtool? How would you even go about writing such function? -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c1 --- Comment #1 from Michal Suchanek <msuchanek@suse.com> --- Created attachment 874943 --> https://bugzilla.suse.com/attachment.cgi?id=874943&action=edit The nvidia module that includes the symbol -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 Miroslav Beneš <mbenes@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-bugs@opensuse.org |mbenes@suse.com CC| |mbenes@suse.com -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c2 Miroslav Beneš <mbenes@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(msuchanek@suse.co | |m) CC| |msuchanek@suse.com --- Comment #2 from Miroslav Beneš <mbenes@suse.com> --- Strange and I would need to take a look at sources because 000000000000abb0 <nv_register_procfs>: [...] adbc: e8 00 00 00 00 call adc1 <nv_register_procfs+0x211> adbd: R_X86_64_PC32 __stack_chk_fail-0x4 adc1: 4c 8b 05 00 00 00 00 mov 0x0(%rip),%r8 # adc8 <nv_register_procfs+0x218> adc4: R_X86_64_PC32 .bss+0x1284 adc8: 48 c7 c1 00 00 00 00 mov $0x0,%rcx adcb: R_X86_64_32S .rodata+0x420 adcf: 48 89 c2 mov %rax,%rdx add2: be 24 81 00 00 mov $0x8124,%esi add7: e8 00 00 00 00 call addc <nv_register_procfs+0x22c> add8: R_X86_64_PC32 proc_create_data-0x4 addc: 0f 1f 40 00 nopl 0x0(%rax) It really ends with a call to proc_create_data() which, obviously, is not a noreturn function. So it falls through to the next one. If it ended with the call to __stack_chk_fail() a couple of lines above, like most of the other functions, everything would be fine, because __stack_chk_fail() is marked as noreturn and objtool knows about it. The code between the call to __stack_chk_fail() and the call to proc_create_data() fills arguments. adc1 address is a target of intra-function jump so looks like a valid code. 000000000000ade0 <nv_unregister_procfs>: ade0: e8 00 00 00 00 call ade5 <nv_unregister_procfs+0x5> ade1: R_X86_64_PC32 __fentry__-0x4 ade5: 48 8b 3d 00 00 00 00 mov 0x0(%rip),%rdi # adec <nv_unregister_procfs+0xc> ade8: R_X86_64_PC32 .bss+0x1274 adec: e9 00 00 00 00 jmp adf1 <nv_unregister_procfs+0x11> aded: R_X86_64_PC32 proc_remove-0x4 adf1: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) adf8: 00 00 00 00 adfc: 0f 1f 40 00 nopl 0x0(%rax) I left it here as an example. There is jmp at the end which is correct. It does not return. Sibling jump. All good. I am only guessing without sources though. Could you also appends logs from the compilation, please? -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c3 Michal Suchanek <msuchanek@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(msuchanek@suse.co | |m) | --- Comment #3 from Michal Suchanek <msuchanek@suse.com> --- Created attachment 875026 --> https://bugzilla.suse.com/attachment.cgi?id=875026&action=edit build log -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c4 --- Comment #4 from Michal Suchanek <msuchanek@suse.com> --- Created attachment 875027 --> https://bugzilla.suse.com/attachment.cgi?id=875027&action=edit source -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c5 --- Comment #5 from Michal Suchanek <msuchanek@suse.com> --- Created attachment 875028 --> https://bugzilla.suse.com/attachment.cgi?id=875028&action=edit preprocessed source -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c6 Miroslav Beneš <mbenes@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mjambor@suse.com, | |petr.pavlu@suse.com --- Comment #6 from Miroslav Beneš <mbenes@suse.com> --- Thanks. So it seems that... (thanks Petr Pavlu for helping out because it is really a strange thing) adc1: 4c 8b 05 00 00 00 00 mov 0x0(%rip),%r8 # adc8 <nv_register_procfs+0x218> adc4: R_X86_64_PC32 .bss+0x1284 adc8: 48 c7 c1 00 00 00 00 mov $0x0,%rcx adcb: R_X86_64_32S .rodata+0x420 adcf: 48 89 c2 mov %rax,%rdx add2: be 24 81 00 00 mov $0x8124,%esi add7: e8 00 00 00 00 call addc <nv_register_procfs+0x22c> add8: R_X86_64_PC32 proc_create_data-0x4 addc: 0f 1f 40 00 nopl 0x0(%rax) is just dead code. Originally it comes from for (i = 0; __nv_patches[i].short_description; i++) { nv_procfs_add_text_file(proc_nvidia_patches, __nv_patches[i].short_description, __nv_patches[i].description); } where __nv_patches is a global array initialized as static struct { const char *short_description; const char *description; } __nv_patches[] = { { ((void *)0), ((void *)0) } }; The dead code is nv_procfs_add_text_file() inlined (it contains just the call to proc_create_data()). The compiler knows that the loop does not exist and it is optimized out. The call site (from the "loop") looks like aced: 48 8b 3d 00 00 00 00 mov 0x0(%rip),%rdi # acf4 <nv_register_procfs+0x144> acf0: R_X86_64_PC32 .bss+0x127c acf4: 48 85 ff test %rdi,%rdi acf7: 0f 85 c4 00 00 00 jne adc1 <nv_register_procfs+0x211> Meaning that the jump never happens (the array contains just zeros). However it is all left there. No idea why. So objtool basically reports a correct thing. Martin, does it ring a bell? Could you shed some light on this? -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c7 --- Comment #7 from Michal Suchanek <msuchanek@suse.com> --- Created attachment 875072 --> https://bugzilla.suse.com/attachment.cgi?id=875072&action=edit object file -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c8 --- Comment #8 from Martin Jambor <mjambor@suse.com> --- GCC 7 deduces that the loop cannot complete its first iteration because then it would read past the __nv_patches array if it did. So it puts its __builtin_unreachable after the call to proc_create_data because if it did not return, it would be still necessary to call it. To make the compiler do what you want it do, I think you need to make __nv_patches const. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c9 --- Comment #9 from Miroslav Beneš <mbenes@suse.com> --- Michal, could you try Martin's suggestion, please? I do not think this is something that objtool can do anything about. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c10 --- Comment #10 from Michal Suchanek <msuchanek@suse.com> --- Created attachment 875152 --> https://bugzilla.suse.com/attachment.cgi?id=875152&action=edit nvidia package change Making the array const is indeed workaround for the problem. Nonetheless, we have real code here where gcc fails to eliminate some dead branch of the code, and as a result objtool issues a useless warning. That's problem with the tools, not the code. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c11 --- Comment #11 from Martin Jambor <mjambor@suse.com> --- I will look into why the variable was not promoted to const automagically in this case because that seems to be somewhat lame (but maybe some real reason escaped me). However, failing that, GCC cannot reliably identify every piece of unreachable code. If, for example, the address of the non-const array escaped the compilation unit, then GCC simply might not do anything better than what it does. (Well, GCC 13 has a new option -funreachable-traps which puts a trap before the "fall-through" but we have just discovered its current documentation is not correct and since ATM I cannot really estimate its costs, I cannot really recommend using it or even evaluating it.) -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c12 --- Comment #12 from Miroslav Beneš <mbenes@suse.com> --- I will not get caught in "broken code, broken tools, broken gcc" discussion but the thing is that objtool cannot know if it is a bug in the code or not. So it reports the warning and stops the analysis. Things like this are fixed in the mainline kernel and const would be appropriate here. Having said that the support for noreturn functions in objtool is not great. There have been talks about improving it but it needs to happen in cooperation with GCC because there is currently no way to know that GCC decided to place unreachable somewhere (decided that a function is noreturn). I do not know if there is a progress on this front behind the scenes. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c13 --- Comment #13 from Martin Jambor <mjambor@suse.com> --- GCC 13 (at least) does not automagically promote the array to const because of the -flive-patching=inline-clone option (which disables the pure-const pass). It is even natural, we need to be ready for some unknown live-patch to possibly leak the address even when such a leak is not present in the code at hand. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c14 --- Comment #14 from Martin Jambor <mjambor@suse.com> --- (In reply to Miroslav Beneš from comment #12)
Having said that the support for noreturn functions in objtool is not great. There have been talks about improving it but it needs to happen in cooperation with GCC because there is currently no way to know that GCC decided to place unreachable somewhere (decided that a function is noreturn). I do not know if there is a progress on this front behind the scenes.
Once we start building our enterprise kernels with GCC 13+, asking our performance team to evaluate the option -funreachable-traps might be worthwhile. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c15 Michal Suchanek <msuchanek@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sndirsch@suse.com Flags| |needinfo?(sndirsch@suse.com | |) --- Comment #15 from Michal Suchanek <msuchanek@suse.com> --- Please consider adding the attached patch to avoid displaying this (false-positive) warning to users. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c16 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(msuchanek@suse.co | |m) --- Comment #16 from Stefan Dirsch <sndirsch@suse.com> --- Which warning do you mean exactly? conftest.sh creates code snippets to figure out for specific functions, how many and which kind of parameters are used in the kernel against the driver is being built. This is needed due to all kind of backports done in downstream kernels of distributions, so checking the kernel version doesn't work. So maybe it's just overkill to fix such a warning? I mean if the detection would be wrong because of this, probably the driver build would fail. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c17 Michal Suchanek <msuchanek@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(msuchanek@suse.co | |m) | --- Comment #17 from Michal Suchanek <msuchanek@suse.com> --- (In reply to Stefan Dirsch from comment #16)
Which warning do you mean exactly?
The one shown in the bug title which is displayed every time the package is installed on the user's system and builds the kernel modules. This part of the conftest.sh ends up in the generated code, confuses gcc which fails to eliminate some dead code which then confuses objtool. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c18 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(sndirsch@suse.com | |) | --- Comment #18 from Stefan Dirsch <sndirsch@suse.com> --- Ok. BTW, there is no such warning when building against Tumbleweed, just against Leap 15.5 and 15.6. Check this https://build.suse.de/project/monitor/Proprietary:X11:Drivers nvidia-driver-G06 nvidia-gfxG05 nvidia-gfxG04 (this bug G04 = 390.157) All these are affected by this objtool warning on Leap. mabye one should figure out what's different in objtool for Tumbleweed? -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c19 Miroslav Beneš <mbenes@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pmladek@suse.com --- Comment #19 from Miroslav Beneš <mbenes@suse.com> --- (In reply to Martin Jambor from comment #13)
GCC 13 (at least) does not automagically promote the array to const because of the -flive-patching=inline-clone option (which disables the pure-const pass). It is even natural, we need to be ready for some unknown live-patch to possibly leak the address even when such a leak is not present in the code at hand.
Bah, right. I wonder if we should just get rid of the option eventually as upstream did and somehow detect dangerous changes in our tooling. Adding Petr for his awareness. Thanks a lot for the explanation. (In reply to Stefan Dirsch from comment #18)
Ok. BTW, there is no such warning when building against Tumbleweed, just against Leap 15.5 and 15.6. Check this
https://build.suse.de/project/monitor/Proprietary:X11:Drivers
nvidia-driver-G06 nvidia-gfxG05 nvidia-gfxG04 (this bug G04 = 390.157)
All these are affected by this objtool warning on Leap. mabye one should figure out what's different in objtool for Tumbleweed?
Upstream objtool reports the same warning on the object file so there is no difference. TW does not use the GCC option mentioned above so the object code is different. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c20 --- Comment #20 from Stefan Dirsch <sndirsch@suse.com> --- @Miroslav So I apply this patch and we're done? Then please assign this ticket to me. (Then I will also try to push this patch upstream) -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c21 Miroslav Beneš <mbenes@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|mbenes@suse.com |sndirsch@suse.com --- Comment #21 from Miroslav Beneš <mbenes@suse.com> --- (In reply to Stefan Dirsch from comment #20)
@Miroslav So I apply this patch and we're done? Then please assign this ticket to me.
(Then I will also try to push this patch upstream)
Yes, if you apply the patch, GCC will remove the dead code and everything will be all right from objtool perspective. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Kernel |X11 3rd Party Driver QA Contact|qa-bugs@suse.de |sndirsch@suse.com Assignee|sndirsch@suse.com |gfx-bugs@suse.de -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium Assignee|gfx-bugs@suse.de |sndirsch@suse.com -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 https://bugzilla.suse.com/show_bug.cgi?id=1224402#c22 --- Comment #22 from Stefan Dirsch <sndirsch@suse.com> --- Patch is now applied to G04, G05 and G06 packages. Closing. I'll inform nvidia by email about this patch and ticket. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1224402 Darren Davis <ddavis@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ddavis@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com