On Tue, 2023-10-10 at 07:53 +1300, Michael Hamilton wrote:
The main thing that bit me was that the order of /dev/sd* is no longer fixed which caused problems for my boot configuration and drive configuration scripts (hdparm/smartctrl post-boot scripts).
That shouldn't have anything to do with the recent change. The order of /dev/sd* has been non-deterministic at least since kernel commit f049cf1a7b67 ("scsi: sd: Rely on the driver core for asynchronous probing"), which was in upstream kernel 5.3, 4 years ago. This kernel has been in Leap since 15.2. But even before that, the order has never been truly deterministic, even if it might have seemed to be the case on some systems. The order has always been dependent on the order of SCSI drivers loaded and the order of controllers on the PCI bus. Bottom line: don't use /dev/sd* for referring to specific devices in configuration files, ever.
0) My symptom is being unable to boot (I think people reporting slow boots are experiencing a different problem). 1) Make sure /dev/sd* isn't used in /etc/fstab. 2) Check /etc/default/grub, make sure GRUB_DISABLE_LINUX_UUID is commented out (normal case). 3) Check /boot/grub2/grub.cfg and make sure it doesn't reference /dev/sd* in other ways. 4) Track down any scripts that set parameters for any /dev/sd* devices.
Again, the recent sg3_utils change has nothing to do with /dev/sd*. It only removes certain symlinks under /dev/disk/by-id.
When dealing with #4 above, I'm not sure what hdparm and smartctrl accept, so for the moment I'm using code that extracts the /dev/sd name from the output of lsscsi command.
There is a discussion going on in the forums about how to restore stability to the ordering (for those that don't want to change or want a short term fix), see:
https://forums.opensuse.org/t/problem-with-disks-order-after-snapshot-202309...
Which suggests adding scsi_mod.async_probe=0 to the bootline. Would this be the same for nvme or does scsi cover that?. The suggestion seems to work.
Strange. This is wrong syntax. The syntax of the parameter is "scsi_mod.disable_async_probing=<driver>". See https://www.suse.com/support/kb/doc/?id=000018449 This parameter exists only on Leap, not on TW (and probably not on ALP). nvme is totally different, there is no corresponding parameter.
As you suggested, adding udev.scsi_symlink_src=LTVS udev.scsi_id_serial_src=LTVS to the bootline also seems to result in the old ordering being used.
This finding is even stranger. The /dev/sd* names are assigned by the kernel. sg3_utils or udev have nothing to do with it. The only reason I can think of is that the timing during boot is somehow affected by the different udev rules.
With both suggestions above, I haven't done enough boots to be sure of stability of the /dev/sd* assignments. But either is worth a go to get back into the system and fix the /dev/sd* references.
Sorry, that won't happen, see above. I'm surprised that you haven't run into issues with this approach years ago. The fact that this hits you now must be some weird coincidence that must be inspected further. Please open a bug and assign it to me. Thanks, Martin