[opensuse] Trying to use the 15.1 kernel on 15.0, NVidia crashes.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Content-ID: <alpine.LSU.2.21.1903291107010.4036@Telcontar.valinor> Hi, As mentioned previously, I have a problem with hibernation, and I need to try if a more advanced kernel solves the problem. So I have installed the 15.1 kernel on 15.0 (instead of trying the TW kernel), and the 15.1 NVidia driver. So this is what I installed: cer@Telcontar:~> rpm -qa | grep -i kernel | sort kernel-default-4.12.14-lp150.12.48.1.x86_64 kernel-default-4.12.14-lp151.22.9.x86_64 kernel-default-devel-4.12.14-lp150.12.48.1.x86_64 kernel-default-devel-4.12.14-lp151.22.9.x86_64 kernel-devel-4.12.14-lp150.12.45.1.noarch kernel-devel-4.12.14-lp150.12.48.1.noarch (I do not see a 15.1 version in repo) kernel-docs-4.12.14-lp150.12.48.1.noarch (I do not see a 15.1 version in repo) kernel-firmware-20190118-lp150.2.12.1.noarch (I do not see a 15.1 version in repo) kernel-macros-4.12.14-lp150.12.48.1.noarch (I do not see a 15.1 version in repo) kernel-source-4.12.14-lp150.12.45.1.noarch kernel-source-4.12.14-lp150.12.48.1.noarch (I do not see a 15.1 version in repo) kernel-syms-4.12.14-lp150.12.48.1.x86_64 kernel-syms-4.12.14-lp151.22.9.x86_64 (I do not see a 15.1 version in repo) nfs-kernel-server-2.1.1-lp150.4.6.1.x86_64 texlive-l3kernel-2017.133.svn44483-lp150.5.4.noarch texlive-l3kernel-doc-2017.133.svn44483-lp150.5.4.noarch cer@Telcontar:~> rpm -qa | grep -i nvidia | sort nvidia-computeG03-340.107-lp151.12.2.x86_64 nvidia-gfxG03-kmp-default-340.107_k4.12.14_lp151.22-lp151.12.26.x86_64 nvidia-glG03-340.107-lp151.12.2.x86_64 nvidia-uvm-gfxG03-kmp-default-340.107_k4.12.14_lp151.22-lp151.12.25.x86_64 x11-video-nvidiaG03-340.107-lp151.12.2.x86_64 cer@Telcontar:~> uname -a Linux Telcontar 4.12.14-lp151.22-default #1 SMP Thu Feb 28 15:45:32 UTC 2019 (25279f0) x86_64 x86_64 x86_64 GNU/Linux cer@Telcontar:~> Graphic crashes - as I half feared: cer@Telcontar:~> startx xauth: file /home/cer/.serverauth.3927 does not exist X.Org X Server 1.19.6 Release Date: 2017-12-20 X Protocol Version 11, Revision 0 Build Operating System: openSUSE SUSE LINUX Current Operating System: Linux Telcontar 4.12.14-lp151.22-default #1 SMP Thu Feb 28 15:45:32 UTC 2019 (25279f0) x86_64 Kernel command line: BOOT_IMAGE=/vmlinuz-4.12.14-lp151.22-default root=UUID=ac173013-18ad-4c4e-921e-fd2ecfb56495 resume=/dev/disk/by-label/ssd-swap splash=verbose Build Date: 14 March 2019 12:00:00PM Current version of pixman: 0.34.0 Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: "/home/cer/.local/share/xorg/Xorg.0.log", Time: Thu Mar 28 19:57:18 2019 (==) Using config file: "/etc/X11/xorg.conf" (==) Using config directory: "/etc/X11/xorg.conf.d" (==) Using system config directory "/usr/share/X11/xorg.conf.d" xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted) (EE) Fatal server error: (EE) no screens found(EE) (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. (EE) Please also check the log file at "/home/cer/.local/share/xorg/Xorg.0.log" for additional information. (EE) GA Arbitration: Cannot restore default device.(EE) Server terminated with error (1). Closing log file. V ^Cxinit: giving up xinit: unable to connect to X server: Connection refused xinit: unexpected signal 2 - ------------------------------------------------------------------------------------------- xinit failed. /usr/bin/Xorg is not setuid, maybe that's the reason? If so either use a display manager (strongly recommended) or adjust /etc/permissions.local and run "chkstat --system --set" afterwards cer@Telcontar:~> (I had to press ctrl-c to stop to get back the prompt) The "/home/cer/.local/share/xorg/Xorg.0.log" is attached. I don't see why it crashes, except that it says "Failed to initialize the NVIDIA kernel module.", but not why. It says to see syslog. The relevant portion is this: <10.6> 2019-03-28 19:56:54 Telcontar login - - - LOGIN ON tty2 BY cer <3.6> 2019-03-28 19:56:55 Telcontar systemd 1 - - Started Clamav antivirus Deamon. <3.6> 2019-03-28 19:56:55 Telcontar systemd 1 - - Starting Amavisd-new Virus Scanner interface... <3.6> 2019-03-28 19:56:56 Telcontar systemd 1 - - Started Amavisd-new Virus Scanner interface. <3.6> 2019-03-28 19:56:56 Telcontar systemd 1 - - Starting Postfix Mail Transport Agent... <3.6> 2019-03-28 19:56:56 Telcontar echo 3820 - - Starting mail service (Postfix) <3.6> 2019-03-28 19:56:58 Telcontar systemd 1 - - Started Postfix Mail Transport Agent. <3.6> 2019-03-28 19:56:58 Telcontar systemd 1 - - Started Command Scheduler. <3.6> 2019-03-28 19:56:58 Telcontar systemd 1 - - Reached target Multi-User System. <3.6> 2019-03-28 19:56:58 Telcontar systemd 1 - - Starting Plays a welcome sound when target multi-user is reached... <9.6> 2019-03-28 19:56:58 Telcontar cron 3917 - - (CRON) INFO (RANDOM_DELAY will be scaled with factor 31% if used.) <3.6> 2019-03-28 19:56:58 Telcontar systemd 1 - - Starting Update UTMP about System Runlevel Changes... <9.6> 2019-03-28 19:56:58 Telcontar cron 3917 - - (CRON) INFO (running with inotify support) <3.6> 2019-03-28 19:56:58 Telcontar systemd 1 - - Started Update UTMP about System Runlevel Changes. <3.6> 2019-03-28 19:56:58 Telcontar Mine - - - Saying hello world. <3.6> 2019-03-28 19:56:58 Telcontar systemd 1 - - Started Plays a welcome sound when target multi-user is reached. <3.6> 2019-03-28 19:57:45 Telcontar systemd 1 - - Starting Console Mouse manager... <1.6> 2019-03-28 19:57:45 Telcontar 3982 - - *** info [daemon/startup.c(136)]: <1.6> 2019-03-28 19:57:45 Telcontar 3982 - - Started gpm successfully. Entered daemon mode. <3.6> 2019-03-28 19:57:45 Telcontar systemd 1 - - Started Console Mouse manager. <4.6> 2019-03-28 19:58:14 Telcontar systemd-logind 1537 - - New session 3 of user cer. I started gpm in order to paste the error log in text mode, so I know the instant startx failed. (init 5 makes the video flash in text mode, machine had to be rebooted to recover) - -- Cheers Carlos E. R. (from 15.0 x86_64 at Telcontar) - ------------ End PGP Signed Message Verified 2019-03-29 11:07:01 ----------- - -- Cheers, Carlos E. R. (from openSUSE 15.0 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXJ3xphwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVBO0An0lqiCCpGWDJlGNC7M6Y faGonoULAJ9zUTvSvbEosfN8qTKV1PKcue5GfA== =z2Va -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday, 2019-03-29 at 11:21 +0100, Carlos E. R. wrote:
Hi,
As mentioned previously, I have a problem with hibernation, and I need to try if a more advanced kernel solves the problem. So I have installed the 15.1 kernel on 15.0 (instead of trying the TW kernel), and the 15.1 NVidia driver.
So this is what I installed:
cer@Telcontar:~> rpm -qa | grep -i kernel | sort kernel-default-4.12.14-lp150.12.48.1.x86_64 kernel-default-4.12.14-lp151.22.9.x86_64
kernel-default-devel-4.12.14-lp150.12.48.1.x86_64 kernel-default-devel-4.12.14-lp151.22.9.x86_64
kernel-devel-4.12.14-lp150.12.45.1.noarch kernel-devel-4.12.14-lp150.12.48.1.noarch (I do not see a 15.1 version in repo)
In fact, yast complains that "nothing provides kernel-devel 4.12.14-lp151.22.9 needed by kernel-syms-4.12.14-lp151.22.9 startx complains: modprobe: ERROR: could not find module by name='nvidia' modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg) If the module is missing in the Nvidia rpms or the kernel rpms, it is impossible to start X. Is this a bug? Intentional? Some other package is missing? - -- Cheers, Carlos E. R. (from openSUSE 15.0 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHIEARECADIWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXJ35ERQccm9iaW4ubGlz dGFzQGdteC5lcwAKCRC1MxgcbY1H1WFLAJ9UUAGAql+Qn8jP8xsl02xlNoAOlgCb BHHGk9u25I5lvxyfNG3S8Qfdqto= =+651 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 29/03/2019 11.53, Carlos E. R. wrote:
On Friday, 2019-03-29 at 11:21 +0100, Carlos E. R. wrote:
Hi,
As mentioned previously, I have a problem with hibernation, and I need to try if a more advanced kernel solves the problem. So I have installed the 15.1 kernel on 15.0 (instead of trying the TW kernel), and the 15.1 NVidia driver.
So this is what I installed:
cer@Telcontar:~> rpm -qa | grep -i kernel | sort kernel-default-4.12.14-lp150.12.48.1.x86_64 kernel-default-4.12.14-lp151.22.9.x86_64
kernel-default-devel-4.12.14-lp150.12.48.1.x86_64 kernel-default-devel-4.12.14-lp151.22.9.x86_64
kernel-devel-4.12.14-lp150.12.45.1.noarch kernel-devel-4.12.14-lp150.12.48.1.noarch (I do not see a 15.1 version in repo)
In fact, yast complains that "nothing provides kernel-devel 4.12.14-lp151.22.9 needed by kernel-syms-4.12.14-lp151.22.9
startx complains:
modprobe: ERROR: could not find module by name='nvidia' modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg)
If the module is missing in the Nvidia rpms or the kernel rpms, it is impossible to start X. Is this a bug? Intentional? Some other package is missing?
The cause of the problem was that kernel-devel had not been updated because I could not find it. Thanks to Takashi Iwai that he pointed out that the file is in noarch directory, not x86_64. To avoid problems that adding the 15.1 repo to YaST would create (accidentally updating the entire system) I had manually copied the kernel rpm to a local directory and used that as a repo. I had forgotten that there were several directories at the server. Another problem is that when installing the nvidia-gfxG03-kmp-default rpm, and if it fails to create the module in its Make process, the script continues happily and YaST or zypper is not aware of any error. And the output of that process is not logged to see. I had to run the "rpm --install --force nvidia-gfxG03-kmp-defaul..." command on a terminal to watch the messages, and there it was (log below is from a different run, but the same basic problem): make: *** /usr/src/linux-obj/x86_64/default: No such file or directory. Stop. <---------- make: *** /usr/src/linux-obj/x86_64/default: No such file or directory. Stop. <---------- /usr/src/kernel-modules/nvidia-340.107-default / NVIDIA: calling KBUILD... make[1]: *** /lib/modules//source: No such file or directory. Stop. NVIDIA: left KBUILD. nvidia.ko failed to build! <---------- make: *** [Makefile:192: nvidia.ko] Error 1 <========= But it did not stop, it continued, and the thing claimed success. After I knew all that, I tried again and I'm now happily running the 15.1 kernel: cer@Telcontar:~> uname -a Linux Telcontar 4.12.14-lp151.23-default #1 SMP Fri Mar 22 18:22:30 UTC 2019 (b40158b) x86_64 x86_64 x86_64 GNU/Linux cer@Telcontar:~> which has solved my issues with crash after return from hibernation and swap race crashing the system under heavy use :-D -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
participants (2)
-
Carlos E. R.
-
Carlos E. R.