https://bugzilla.suse.com/show_bug.cgi?id=1231312 Bug ID: 1231312 Summary: jc42 not retaining devices after reboot, related i2c issues Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Kernel:Drivers Assignee: kernel-bugs@suse.de Reporter: pallaswept@proton.me QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- I noticed today that since kernel 6.11 my DDR4 temperature sensors were absent from applications like `sensors` or CoolerControl. The jc42 kernel module which normally provides these hwmon sensors is loaded and there are no errors in the journal searching for jc42, although there is an absence of usual log messages from coolercontrol where it lists the devices it has derived from hwmon. Looking at my sensors conf file and configuration logs, I know that the controller for these devices was i2c-6. I notice that this is now occupied by the nvidia GPU:
sudo i2cdetect -l i2c-0 i2c Synopsys DesignWare I2C adapter I2C adapter i2c-1 smbus SMBus PIIX4 adapter port 0 at 0b00 SMBus adapter i2c-2 smbus SMBus PIIX4 adapter port 2 at 0b00 SMBus adapter i2c-3 smbus SMBus PIIX4 adapter port 1 at 0b20 SMBus adapter i2c-4 i2c NVIDIA i2c adapter 1 at 7:00.0 I2C adapter i2c-5 i2c NVIDIA i2c adapter 5 at 7:00.0 I2C adapter i2c-6 i2c NVIDIA i2c adapter 6 at 7:00.0 I2C adapter i2c-7 i2c NVIDIA i2c adapter 7 at 7:00.0 I2C adapter i2c-8 i2c NVIDIA i2c adapter 8 at 7:00.0 I2C adapter
Looking at the filesystem I can see that the jc42 devices are not present on any controller. If I add new devices, they will function seemingly normally, but upon reboot, they are lost. I can re-add them and they will function until reboot. Rebooting from 6.11.0 where the devices are not present, having been 'forgotten' from the previous reboot, and the controller is on i2c-1, into 6.10.11, the devices are retained, fully functional, and controlled by i2c-6. I noticed https://bugzilla.opensuse.org/show_bug.cgi?id=1231228 but my problem seems different. I do not have the jc42 modules automatically detected like that (it would be nice!) but I do notice that in 6.11, there are new entries present which did not exist in 6.10: Oct 04 23:30:38 localhost kernel: i2c i2c-1: Successfully instantiated SPD at 0x50 Oct 04 23:30:38 localhost kernel: i2c i2c-1: Successfully instantiated SPD at 0x51 Oct 04 23:30:38 localhost kernel: i2c i2c-1: Successfully instantiated SPD at 0x52 Oct 04 23:30:38 localhost kernel: i2c i2c-1: Successfully instantiated SPD at 0x53 These are my memory sticks' SPD chips and they were previously only readable by using the `eeprom` kernel module. The `ee1004` module and `at24` modules would return nothing. They are now readable without any change by me, using the ee1004 module, which is in use for the above device. So, it seems like something changed with the piix4_smbus driver, it having changed address; and it's now detecting my DRAM SPD, which is nice; And the new SPD driver works, which is also nice, given that the old one seems to be gone; but it's not detecting the temperature sensors, and is forgetting(?) them when I add them. I will collect the dmesg and hwinfo outputs from each kernel version (6.10.11, 6.11.0, 6.12-rc1) shortly. Is there anything else that might be useful while I'm at it? A word of caution to others who might be suffering this bug or one like it: I experienced very severe hardware faults related to this. After adding the devices to the controller and rebooting, my monitor started to flash on and off a few times per second, and after factory-resetting the monitor and rebooting, my PC would not POST, failing first to train the RAM (three or four different errors at random, shown on motherboard 8-segment display), and once that was isolated, failing to detect graphics. A BIOS reset was needed to get it back in order. It has been OK since. I believe it to be a one-off not triggered by this bug alone, but it was undoubtedly triggered by my attempt to fix the missing devices by following standard procedure to add them. It seems fine now, but I would be reticent not to mention it. -- You are receiving this mail because: You are on the CC list for the bug.