When are upstream kernel commits merged into Tumbleweed?
The nouveau driver has had a serious, longstanding bug that seemingly only exhibits itself with the 6.3 and onward kernels, and only with older Nvidia chipsets. For details, see https://forums.opensuse.org/t/older-laptop-tumbleweed-nvidia-and-nouveau-dri... and https://gitlab.freedesktop.org/drm/nouveau/-/issues/213. After several weeks of hard work by developers at Freedesktop and RedHat (it was a very difficult bug to track down) a fix was accepted at https://github.com/torvalds/linux/commit/c8a5d5ea3ba6a18958f8d76430e4cd68eea... and is in both https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/nouveau/nouvea... and https://github.com/openSUSE/kernel/blob/master/drivers/gpu/drm/nouveau/nouve... But the change has not yet been merged into the kernel-source-6.3.7-1.2.noarch package's /usr/src/linux-6.3.7-1/drivers/gpu/drm/nouveau/nouveau_drm.c file, nor (as evidenced by the kernel fault still happening at boot) in 20230620's vmlinuz-6.3.7-1-default kernel. Should I expect that it will be included in an upcoming Tumbleweed snapshot/repo/ISO in the near future? If not, how can I advocate that it should be? There are already several related Bugzillas (1212553, 1211217, 1211568, 1211216, 1209197), none of which seem to be aware of the root cause nor this newly-released fix. Note that although the bug currently only triggers with specific, older Nvidia GPUs and "has never caused problems before", the previous code is undeniably wrong and the patch should be applied regardless. And once again, it has been accepted into the mainstream kernel sources. Thanks for any insights into the Tumbleweed release process, and on how to proceed.
On 6/22/23 07:19, Mark Rubin via openSUSE Factory wrote:
The nouveau driver has had a serious, longstanding bug that seemingly only exhibits itself with the 6.3 and onward kernels, and only with older Nvidia chipsets. For details, see https://forums.opensuse.org/t/older-laptop-tumbleweed-nvidia-and-nouveau-dri... and https://gitlab.freedesktop.org/drm/nouveau/-/issues/213.
After several weeks of hard work by developers at Freedesktop and RedHat (it was a very difficult bug to track down) a fix was accepted at https://github.com/torvalds/linux/commit/c8a5d5ea3ba6a18958f8d76430e4cd68eea... and is in both https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/nouveau/nouvea... and https://github.com/openSUSE/kernel/blob/master/drivers/gpu/drm/nouveau/nouve...
But the change has not yet been merged into the kernel-source-6.3.7-1.2.noarch package's /usr/src/linux-6.3.7-1/drivers/gpu/drm/nouveau/nouveau_drm.c file, nor (as evidenced by the kernel fault still happening at boot) in 20230620's vmlinuz-6.3.7-1-default kernel.
Should I expect that it will be included in an upcoming Tumbleweed snapshot/repo/ISO in the near future? If not, how can I advocate that it should be? There are already several related Bugzillas (1212553, 1211217, 1211568, 1211216, 1209197), none of which seem to be aware of the root cause nor this newly-released fix.
The Kernel team is generally highly reachable from bugzilla so i'd add this new information to the most relevant bug and they should be able to take care of it. -- Simon Lees (Simotek) http://simotek.net Emergency Update Team keybase.io/simotek SUSE Linux Adelaide Australia, UTC+10:30 GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B
On 6/21/23 17:49, Mark Rubin via openSUSE Factory wrote:
The nouveau driver has had a serious, longstanding bug that seemingly only exhibits itself with the 6.3 and onward kernels, and only with older Nvidia chipsets. For details, see https://forums.opensuse.org/t/older-laptop-tumbleweed-nvidia-and-nouveau-dri... and https://gitlab.freedesktop.org/drm/nouveau/-/issues/213.
After several weeks of hard work by developers at Freedesktop and RedHat (it was a very difficult bug to track down) a fix was accepted at https://github.com/torvalds/linux/commit/c8a5d5ea3ba6a18958f8d76430e4cd68eea... and is in both https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/nouveau/nouvea... and https://github.com/openSUSE/kernel/blob/master/drivers/gpu/drm/nouveau/nouve...
But the change has not yet been merged into the kernel-source-6.3.7-1.2.noarch package's /usr/src/linux-6.3.7-1/drivers/gpu/drm/nouveau/nouveau_drm.c file, nor (as evidenced by the kernel fault still happening at boot) in 20230620's vmlinuz-6.3.7-1-default kernel.
Should I expect that it will be included in an upcoming Tumbleweed snapshot/repo/ISO in the near future? If not, how can I advocate that it should be? There are already several related Bugzillas (1212553, 1211217, 1211568, 1211216, 1209197), none of which seem to be aware of the root cause nor this newly-released fix.
Note that although the bug currently only triggers with specific, older Nvidia GPUs and "has never caused problems before", the previous code is undeniably wrong and the patch should be applied regardless. And once again, it has been accepted into the mainstream kernel sources.
Thanks for any insights into the Tumbleweed release process, and on how to proceed.
It was unclear to me which kernel version has the fix you desire, however, you might want to look at the kernel-vanilla package as it is my understanding that is without any TW patches included or excluded -- Regards, Joe
On 21. 06. 23, 23:49, Mark Rubin via openSUSE Factory wrote:
Thanks for any insights into the Tumbleweed release process, and on how to proceed.
The patch is in Kernel:stable since: commit 4abd087366954862e8b236c739aeb65be922fc9a Author: Takashi Iwai <tiwai@suse.de> Date: Fri Jun 16 09:21:49 2023 +0200 nouveau: fix client work fence deletion race (bsc#1211217 bsc#1211568). It has just been submitted to TW with many other changes: https://build.opensuse.org/request/show/1094502 So 6.3.9 will have it. You can use Kernel:stable in the meantime, if you want. thanks, -- js suse labs
Jiri Slaby wrote:
The patch is in Kernel:stable since: ... It has just been submitted to TW with many other changes: https://build.opensuse.org/request/show/1094502
This is good news. Thanks for the definitive answer. Some more questions about the patch, if you or anyone else has the time to answer them: - Once 6.3.9 is released, will the kernels that the TW install ISOs boot to use it? The current ones fail on systems with the problematic Nvidia GPUs unless "nomodeset" is manually added to the kernel parameters when booting them. That's a workaround, but it isn't something a non-expert user would know to do without searching or asking in the forums and mailing lists (or more likely giving up, saying "openSUSE is broken"). - Likewise, is there any chance the patch will be backported to Leap 15.5's 5.x kernels? Leap's ISOs also don't boot without "nomodeset", and although their installed kernels do (as opposed to installed TW which doesn't) I've recently discovered they later lock up with continuous errors in dmesg such as: ``` nouveau 0000:01:00.0: fifo: CHSW_ERROR 00000002 ``` and: ``` nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 [] nouveau 0000:01:00.0: fifo: LB_ERROR ```
So 6.3.9 will have it. You can use Kernel:stable in the meantime, if you want.
I assume that means something like https://download.opensuse.org/repositories/Kernel:/stable/standard/x86_64/ke... A more generic question: If I use `zypper` or `YaST2` to install that, will they automatically configure my system so I have the choice to boot it vs kernel-default-6.3.7-1.2 (or whatever the latest `zypper dup` has installed), or do I have to manually run `dracut` or YaST2 Boot Loader or some other configuration tool? Thanks again for applying this important patch to openSUSE's kernels.
On Thu, 2023-06-22 at 09:07 +0000, Mark Rubin via openSUSE Factory wrote:
- Once 6.3.9 is released, will the kernels that the TW install ISOs boot to use it? The current ones fail on systems with the problematic Nvidia GPUs unless "nomodeset" is manually added to the kernel parameters when booting them. That's a workaround, but it isn't something a non-expert user would know to do without searching or asking in the forums and mailing lists (or more likely giving up, saying "openSUSE is broken").
The ISO files are regeneated based on current packages with every single snapshot. This will also be true for when the new kernel lands. Cheers, Dominique
I have very bad news (for me at least) about the nouveau driver kernel fix, and am looking for advice on how to proceed. I performed the following actions: ``` # fgrep -i multiversion /etc/zypp/zypp.conf | egrep -v '^#' multiversion = provides:multiversion(kernel) multiversion.kernels = latest,latest-1,running # zypper addrepo https://download.opensuse.org/repositories/Kernel:/stable/standard/ kernel-stable-standard # zypper install kernel-default-6.3.9-1.1.g0df701d.x86_64 # ls -F /boot .vmlinuz-6.3.6-1-default.hmac@ initrd@ .vmlinuz-6.3.7-1-default.hmac@ initrd-6.3.6-1-default .vmlinuz-6.3.9-1.g0df701d-default.hmac@ initrd-6.3.7-1-default System.map-6.3.6-1-default@ initrd-6.3.9-1.g0df701d-default System.map-6.3.7-1-default@ sysctl.conf-6.3.6-1-default@ System.map-6.3.9-1.g0df701d-default@ sysctl.conf-6.3.7-1-default@ config-6.3.6-1-default@ sysctl.conf-6.3.9-1.g0df701d-default@ config-6.3.7-1-default@ vmlinuz@ config-6.3.9-1.g0df701d-default@ vmlinuz-6.3.6-1-default@ do_purge_kernels vmlinuz-6.3.7-1-default@ grub2/ vmlinuz-6.3.9-1.g0df701d-default@ # rm /boot/do_purge_kernels # ls -F /lib/modules 6.3.6-1-default/ 6.3.7-1-default/ 6.3.9-1.g0df701d-default/ ``` On reboot, GRUB has 6.3.9 as default (proven by examining the boot parameters with "E : Edit Entry") and successfully boots that kernel if "nomodeset" is included (which I have by default). I can also boot 6.3.7 (again "nomodeset") via the "Advanced options for openSUSE Tumbleweed" sub-menu. But if I boot 6.3.9 *without* "nomodeset", it hangs during boot in the same way that 6.3.7 and previous 6.3.x kernels do: Console screen and keyboard frozen and locked up, ssh working, kernel memory fault messages in dmesg. After all the time and work by the nouveau and kernel maintainers, and its inclusion in the upcoming Tumbleweed kernels, this is massively disappointing. Did I do something wrong in my install of 6.3.9? I did not do a `zypper dist-upgrade` after the `zypper install` -- should I have? (I did do one immediately before, to get the latest 20230622 snapshot.) Or is the nouveau/kernel patch not sufficient, at least on my hardware? Others had tested it and https://gitlab.freedesktop.org/drm/nouveau/-/issues/213 was closed as completed. I had hoped to necro-post there confirming that it also worked for me, but unless I can find an error in my 6.3.9 install process I'll have to do the opposite. Thanks for any suggestions on what to do next.
On 24.06.2023 05:50, Mark Rubin via openSUSE Factory wrote:
I have very bad news (for me at least) about the nouveau driver kernel fix, and am looking for advice on how to proceed.
I performed the following actions:
``` # fgrep -i multiversion /etc/zypp/zypp.conf | egrep -v '^#' multiversion = provides:multiversion(kernel) multiversion.kernels = latest,latest-1,running
# zypper addrepo https://download.opensuse.org/repositories/Kernel:/stable/standard/ kernel-stable-standard
# zypper install kernel-default-6.3.9-1.1.g0df701d.x86_64
# ls -F /boot .vmlinuz-6.3.6-1-default.hmac@ initrd@ .vmlinuz-6.3.7-1-default.hmac@ initrd-6.3.6-1-default .vmlinuz-6.3.9-1.g0df701d-default.hmac@ initrd-6.3.7-1-default System.map-6.3.6-1-default@ initrd-6.3.9-1.g0df701d-default System.map-6.3.7-1-default@ sysctl.conf-6.3.6-1-default@ System.map-6.3.9-1.g0df701d-default@ sysctl.conf-6.3.7-1-default@ config-6.3.6-1-default@ sysctl.conf-6.3.9-1.g0df701d-default@ config-6.3.7-1-default@ vmlinuz@ config-6.3.9-1.g0df701d-default@ vmlinuz-6.3.6-1-default@ do_purge_kernels vmlinuz-6.3.7-1-default@ grub2/ vmlinuz-6.3.9-1.g0df701d-default@
# rm /boot/do_purge_kernels
# ls -F /lib/modules 6.3.6-1-default/ 6.3.7-1-default/ 6.3.9-1.g0df701d-default/ ```
On reboot, GRUB has 6.3.9 as default (proven by examining the boot parameters with "E : Edit Entry") and successfully boots that kernel if "nomodeset" is included (which I have by default). I can also boot 6.3.7 (again "nomodeset") via the "Advanced options for openSUSE Tumbleweed" sub-menu.
But if I boot 6.3.9 *without* "nomodeset", it hangs during boot in the same way that 6.3.7 and previous 6.3.x kernels do: Console screen and keyboard frozen and locked up, ssh working, kernel memory fault messages in dmesg.
After all the time and work by the nouveau and kernel maintainers, and its inclusion in the upcoming Tumbleweed kernels, this is massively disappointing. Did I do something wrong in my install of 6.3.9? I did not do a `zypper dist-upgrade` after the `zypper install` -- should I have? (I did do one immediately before, to get the latest 20230622 snapshot.)
Or is the nouveau/kernel patch not sufficient, at least on my hardware? Others had tested it and https://gitlab.freedesktop.org/drm/nouveau/-/issues/213 was closed as completed. I had hoped to necro-post there confirming that it also worked for me, but unless I can find an error in my 6.3.9 install process I'll have to do the opposite.
Thanks for any suggestions on what to do next.
Test the latest vanilla kernel. If the problem is fixed there, file bug report for openSUSE. If the problem is not fixed there, file bug report upstream.
Andrei Borzenkov wrote:
Test the latest vanilla kernel.
Thanks. Yes, after thinking about it that was what I figured I should do. But I did want to check here first to make sure I hadn't made any simple mistakes installing 6.3.9 that could have caused the failure.
If the problem is fixed there, file bug report for openSUSE. If the problem is not fixed there, file bug report upstream.
Will do.
Mark Rubin wrote:
Andrei Borzenkov wrote:
Test the latest vanilla kernel. If the problem is fixed there, file bug report for openSUSE. If the problem is not fixed there, file bug report upstream. Will do.
Update: No, that won't work. 1. There is no "-vanilla" 6.3.9 kernel at https://download.opensuse.org/repositories/Kernel:/stable/standard/x86_64/ 2. The http://download.opensuse.org/tumbleweed/repo/oss/x86_64/kernel-vanilla-6.3.7... doesn't have the nouveau patch, at least according to what's in /usr/src/linux-6.3.7-1/drivers/gpu/drm/nouveau/nouveau_drm.c from http://mirrorcache-us.opensuse.org/tumbleweed/repo/src-oss/src/kernel-source... I tried kernel-vanilla-6.3.7-1.2 anyway, but of course it didn't work. I suppose I could wait for 6.3.9 to hit TW, but that just delays any possible further progress regardless whether the problem is openSUSE or upstream. I tried building 6.3.4-rc7 from source (cloned from https://github.com/openSUSE/kernel) which definitely has the upstream fix, but that failed despite exactly following the instructions available only at https://github.com/openSUSE/kernel-source/blob/master/doc/README.SUSE ("No rule to make target '.kernel_signing_key.pem', needed by 'certs/signing_key.x509".) I'm going to make another post here about not being able to log in to bugzilla to comment on the above topics and to report another, unrelated bug. I've been trying to install Tumbleweed and/or Leap 15.5 for many weeks now and frankly it's getting frustrating and discouraging.
On 24. 06. 23, 23:16, Mark Rubin via openSUSE Factory wrote:
Mark Rubin wrote:
Andrei Borzenkov wrote:
Test the latest vanilla kernel. If the problem is fixed there, file bug report for openSUSE. If the problem is not fixed there, file bug report upstream. Will do.
Update: No, that won't work.
1. There is no "-vanilla" 6.3.9 kernel at https://download.opensuse.org/repositories/Kernel:/stable/standard/x86_64/
Of course there is (on the second page). regards, -- js suse labs
Jiri Slaby wrote:
Mark Rubin wrote: There is no "-vanilla" 6.3.9 kernel at https://download.opensuse.org/repositories/Kernel:/stable/standard/x86_64/
Of course there is (on the second page).
My apologies. I don't know how I missed it (and, yes, I did look at both pages). I was beginning to hope your previous statement:
So 6.3.9 will have it.
meant "it will have it in the future" instead of "has it now". But assuming that (all below from https://download.opensuse.org/repositories/Kernel:/stable/standard/x86_64/) ... The kernel in: kernel-default-6.3.9-1.1.g0df701d.x86_64.rpm was compiled from the code in: kernel-default-debugsource-6.3.9-1.1.g0df701d.x86_64.rpm and the kernel in: kernel-vanilla-6.3.9-1.1.g0df701d.x86_64.rpm from: kernel-vanilla-debugsource-6.3.9-1.1.g0df701d.x86_64.rpm ... all of them *do* have the fix in nouveau_drm.c. Now that you've found my mistake (thanks!) I still have to install and test "vanilla" to see if it works. I've already seen that openSUSE's kernel-default version doesn't, so either way it's bad news: Either "vanilla" doesn't work and the fix is totally insufficient, or it does and something in the openSUSE patches is inhibiting it. So either more difficult debugging by anyone (openSUSE or upstream) who'd be willing to undertake it, or giving up on having modern kernels working on this hardware. At least I can put on hold my failing attempts to build a kernel RPM as chronicled in my other thread, but I did need to (re-)learn how to compile the kernel source anyway so it wasn't a complete waste of time. Thanks again for your patient help.
I believe I've found a fix for the nouveau driver crash reported here and elsewhere by myself and others. See https://gitlab.freedesktop.org/drm/nouveau/-/issues/213 As the problem affects users on the problematic Nvidia GPUs on both Tumbleweed and Leap 15.5 -- preventing booting both the install media and subsequent installed systems -- I hope the patch (or an improved one) can be validated by the Freedesktop developers, pushed to the upstream kernel repo, and merged into openSUSE kernel packages and install images. Any further suggestions on how to proceed are welcome.
@Mark: I read through that thread, should be no reason to "apologize for running older hardware" . . . that's what many folks in linux are in fact doing. I had this problem in my Leap 15.5 edition a couple weeks back when you did, but that seemed to get "fixed" . . . I have a GTX780 card, not sure where that is compared to yours. I did not have the problem in my TW install because I took mr Mazda's advice to use "default" video driver, which might be i915?? . . . i.e., it's not nouveau, although it might be in lsmod. So, perhaps there is a way to try to change video driver to "default"?? I did it during the fresh install a couple years back now, but might be something to check?? Other reason for posting, is that, on my newer Sys76 laptop with '20 spec nvidia card, I lost the GUI several months back in my Debian Bookworm install, there I used nouveau . . . . The system boots to dmesg and then to a blinking cursor on a black screen. Their forum suggested I needed to install "firmware-misc-nonfree" package to bring in the nvidia drivers, but that did not solve the problem. I can only get to a TTY on that system. In that same machine I have Pop!_OS which uses proprietary nvidia drivers, and a Gecko Plasma with "nouveau"??? Both of those systems are working fine . . . . Hard to say if the Debian situation is relevant to yours . . . but, there is some apparent "struggle" to get nvidia hardware running in linux . . . . Hence the loud groans that you hear on the forums . . . . I guess those guys would suggest changing out the card as the ultimate solution???? F
@Fritz: What did you do to get the "default"/i915(??) driver in your TW install? Was it something on the kernel command line in GRUB during the initial boot of the install media, a YaST/zypper install of a package after the installed TW was up and running, both, or neither? The person who I believe is the main nouveau maintainer seems to be working on the problem, and I'm hoping he comes up with a permanent fix. See the new thread at https://gitlab.freedesktop.org/drm/nouveau/-/issues/242 for details. For now, my hacked-in patch seems to be working for me on TW. I still have to see what I can do with Leap 15.5. But as per my latest #19 post at https://forums.opensuse.org/t/older-laptop-tumbleweed-nvidia-and-nouveau-dri..., I'm trying to understand what the "best" driver is out of nouveau, "default", i915, "nv", the community repo updates of the abandoned-by-Nvidia official drivers, or maybe others I don't even know about. :( I'm sure it's different for each GPU/card. Like you, I'm not happy with the idea of Linux distros dropping support for older hardware ("buy a newer graphics card") but since almost all of this is free open-source software, if nobody volunteers to maintain a particular capability then that's what happens. I can't demand otherwise, but hope the support continues, especially when the fixes might be simple and have already been posted.
Mark Rubin wrote:
@Fritz: What did you do to get the "default"/i915(??) driver in your TW install? Was it something on the kernel command line in GRUB during the initial boot of the install media, a YaST/zypper install of a package after the installed TW was up and running, both, or neither?
For now, my hacked-in patch seems to be working for me on TW. I still have to see what I can do with Leap 15.5. But as per my latest #19 post at https://forums.opensuse.org/t/older-laptop-tumbleweed-nvidia-and-nouveau-dri..., I'm trying to understand what the "best" driver is out of nouveau, "default", i915, "nv",
@Mark: It was during the install process, where somewhere (bottom left??) of the installer window was a question on what video driver to select for what was recognized as an Nvidia card. I usually would go for "nouveau" . . . as back in the ancient day of '07 in PPC linux we had no other option other than "nv" . . . nouveau ofered "acceleration" and so forth, nv did not. As I'm a "jack of all trades" guy, with multi-boot systems, I'm not a "naster of SUSE" to know whether you could retro exchange your driver, via YaST . . . I would figure it could be done there or via console . . . adjusting the xorg.conf file??? I believe the "default" option is the "i915" driver listed in my lsmod. The "best driver" is the driver that works on your machine to your satisfaction. I avoided the proprietary nvidia driver option, because nvidia doesn't really keep pace with cutting edge linux like TW, and they don't seem to support their older stuff either. Nouveau was the best all-arounder . . . for the most part. I am fine with what the default driver provides on my GTX 780 card. Might be the forum guys could give some hints about how to try to change it away from nouveau, which you could always switch back to. F
* Fritz Hudnut <non.space.1@gmail.com> [07-01-23 20:55]: ...
The "best driver" is the driver that works on your machine to your satisfaction. I avoided the proprietary nvidia driver option, because nvidia doesn't really keep pace with cutting edge linux like TW, and they don't seem to support their older stuff either. Nouveau was the best all-arounder . . . for the most part. I am fine with what the default driver provides on my GTX 780 card.
odd, been running Tumbleweed since it's inception and using nVidia drivers the entire time on multiple boxes. and evergrees before that. -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc
* Patrick Shanahan <paka@opensuse.org> [07-01-23 21:47]:
* Fritz Hudnut <non.space.1@gmail.com> [07-01-23 20:55]:
...
The "best driver" is the driver that works on your machine to your satisfaction. I avoided the proprietary nvidia driver option, because nvidia doesn't really keep pace with cutting edge linux like TW, and they don't seem to support their older stuff either. Nouveau was the best all-arounder . . . for the most part. I am fine with what the default driver provides on my GTX 780 card.
odd, been running Tumbleweed since it's inception and using nVidia drivers the entire time on multiple boxes.
and evergrees before that.
s/evergrees/evergreen/ -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc
participants (8)
-
Andrei Borzenkov
-
Dominique Leuenberger
-
Fritz Hudnut
-
Jiri Slaby
-
Joe Salmeri
-
Mark Rubin
-
Patrick Shanahan
-
Simon Lees