Bug ID 1183739
Summary Kernel update with a release minor number bump causes old kernel removal
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware Other
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-bugs@opensuse.org
Reporter kerossin@pm.me
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

When a kernel update is released that only bumps up the last number of the
release the older kernel is removed and the new one installed instead of just
installing the new.

>rpm -qi kernel-default-5.11.4
>
>Version     : 5.11.4
>Release     : 1.3   <- this is the number I'm talking about

Perhaps a real example will makes this clearer. Recently Tumbleweed was on
kernel 5.11.4-1.2, snapshot 20210315 had an update - 5.11.4-1.3. Zypper handles
this update by removing -1.2 first and then installing -1.3. With the current
configuration of kernel packages it's not possible to install both of these
packages side by side because they use the exact same file and directory names,
the minor release number is omitted, e.g. /boot/initrd-5.11.4-1-default, 
/lib/modules/5.11.4-1-default/.


What problems does this cause:
Well a fundamental one would be that it doesn't follow the convention and what
everyone expects - a kernel update shouldn't touch the previous kernel and
should be installed separately.
Another one and how we noticed this issue is that it can mess with out of tree
kernel modules during updates. We specifically noticed this with Nvidia driver
kernel modules and it depends on how lucky you are.

For example when 'zypper dup' installs kernel-default before
kernel-default-devel you're in luck, no problem happens.
But if you get unlucky and zypper installs kernel-default-devel before
kernel-default that's when the problem occurs. Nvidia kernel module
installation is triggered by kernel-default-devel installation and those
modules are put into /lib/modules/$kernelversion/updates/. The removal of those
modules is triggered by removal of kernel-default. So here's what happens with
the 20210315 unlucky example:
1) kernel-default-devel-5.11.4-1.3 is installed
2) Nvidia modules are installed at /lib/modules/5.11.4-1-default/updates/
3) kernel-default-5.11.4-1.2 is removed
4) Nvidia modules are deleted from /lib/modules/5.11.4-1-default/updates/
5) kernel-default-5.11.4-1.3 is installed
now your new kernel is left without Nvidia drivers and it will boot to the
terminal.

If it would use file and directory names with the full release number
(/lib/modules/5.11.4-1.2-default/, /lib/modules/5.11.4-1.3-default/) this
wouldn't happen and also the removal of the kernel wouldn't be nescessary.

The way I see it this potentially affects not only Nvidia drivers, if other
kernel modules are managed in a similar fashion they could be affected.

*Note: this same thing occured with snapshot 20210222 when kernel went from
5.10.16-1.2 to 5.10.16-1.3.

Solutions:
Is it possible to use the full release number in file and directory names so
that these kernels would be installed alongside old kernels?
Or perhaps an easier solution is to not increment the minor release number and
just the major e.g. 5.11.4-1, 5.11.4-2, 5.11.4-3, etc. ?


You are receiving this mail because: