[Bug 1205308] New: irqbalance: Failed to initialize thermal events.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 Bug ID: 1205308 Summary: irqbalance: Failed to initialize thermal events. Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: screening-team-bugs@suse.de Reporter: opensuse@mike.franken.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- For a while now (after https://bugzilla.suse.com/show_bug.cgi?id=1204607 had been solved, I get systemd[1]: Started irqbalance daemon. /usr/sbin/irqbalance[1222]: thermal: socket bind failed, thermald may not be running. /usr/sbin/irqbalance[1222]: Failed to initialize thermal events. systemd[1]: irqbalance.service: Deactivated successfully. on every boot. This even happens with the latest irqbalance-1.9.2-1.1.x86_64. Systeminfo: Operating System: openSUSE Tumbleweed 20221109 KDE Plasma Version: 5.26.2 KDE Frameworks Version: 5.99.0 Qt Version: 5.15.7 Kernel Version: 6.0.7-1-default (64-bit) Graphics Platform: X11 Processors: 8 �� 11th Gen Intel�� Core��� i7-1165G7 @ 2.80GHz Memory: 15,0 GiB of RAM Graphics Processor: Mesa Intel�� Xe Graphics Manufacturer: Dell Inc. Product Name: XPS 13 9310 2-in-1 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 Gene Snider <genes1122@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |genes1122@gmail.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 Dirk Mueller <dmueller@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dmueller@suse.com Assignee|screening-team-bugs@suse.de |trenn@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 Dirk Mueller <dmueller@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|irqbalance: Failed to |irqbalance immediately |initialize thermal events. |deactivates itself -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c1 --- Comment #1 from Dirk Mueller <dmueller@suse.com> --- do you have the IRQBALANCE_ONESHOT environment variable set anywhere? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c2 --- Comment #2 from Dirk Mueller <dmueller@suse.com> --- I changed the summary because the message you see is not related to the deactivation. this is what I see: Nov 10 09:31:18 oldboy /usr/sbin/irqbalance[31556]: thermal: socket bind failed, thermald may not be running. Nov 10 09:31:18 oldboy /usr/sbin/irqbalance[31556]: Failed to initialize thermal events. and it keeps running. the hint "thermald might not be running" is here what is wrong. on your system thermald is not installed or not running. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c3 --- Comment #3 from Dirk Mueller <dmueller@suse.com> --- amending my last comment, this is actually a bug that I fixed upstream: https://github.com/Irqbalance/irqbalance/pull/250 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 Jeffrey Cheung <jcheung@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High CC| |jcheung@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c5 --- Comment #5 from Michael Hirmke <opensuse@mike.franken.de> --- (In reply to OBSbugzilla Bot from comment #4)
This is an autogenerated message for OBS integration: This bug (1205308) was mentioned in https://build.opensuse.org/request/show/1035191 Factory / irqbalance
(In reply to Dirk Mueller from comment #3)
amending my last comment, this is actually a bug that I fixed upstream:
Adding AF_NETLINK to the service file of irqbalance solved this problem. On the other hand, you were right: thermald is not installed on this system. But if it is needed, why it wasn't installed as as dependency. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c6 --- Comment #6 from Dirk Mueller <dmueller@suse.com> --- (In reply to Michael Hirmke from comment #5)
Adding AF_NETLINK to the service file of irqbalance solved this problem.
which problem is "this problem"? are you're referring to the log messages about "thermald might not be running". or are you referring to "irqbalance: Deactivated successfully"? it should only solve the "thermald might not be running" messages which are annoying but harmless.
On the other hand, you were right: thermald is not installed on this system. But if it is needed, why it wasn't installed as as dependency.
it is entirely optional, irqbalance functionality should work just fine without it installed. just that with it installed and *running*, irqbalance can do superior rebalancing decisions. Also, thermald is only needed on intel cpus and must not be installed on AMD cpus. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c7 --- Comment #7 from Michael Hirmke <opensuse@mike.franken.de> --- (In reply to Dirk Mueller from comment #6)
(In reply to Michael Hirmke from comment #5)
Adding AF_NETLINK to the service file of irqbalance solved this problem.
which problem is "this problem"? are you're referring to the log messages about "thermald might not be running". or are you referring to "irqbalance: Deactivated successfully"?
Problem isn't solved, I was wrong. I referred to the message regaring thermald *and* the "socket bind failed" message. Now I only get these messages: systemd[1]: Started irqbalance daemon. /usr/sbin/irqbalance[16600]: thermal: received group id (3). systemd[1]: irqbalance.service: Deactivated successfully. So now it isn't running without an error message. It didn't get this in the first place. So the problem isn't solved.
it should only solve the "thermald might not be running" messages which are annoying but harmless.
Indeed.
On the other hand, you were right: thermald is not installed on this system. But if it is needed, why it wasn't installed as as dependency.
it is entirely optional, irqbalance functionality should work just fine without it installed. just that with it installed and *running*, irqbalance can do superior rebalancing decisions.
Also, thermald is only needed on intel cpus and must not be installed on AMD cpus.
Ok, this machine has an Intel cpu. What would be the advantage installing thermad additionally? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c8 --- Comment #8 from Dirk Mueller <dmueller@suse.com> --- (In reply to Michael Hirmke from comment #7)
Ok, this machine has an Intel cpu. What would be the advantage installing thermad additionally?
It orchestrates power management on the intel cpu family systems, which both reduce power consumption as well as improve performance (because the CPU has due to better power management more headroom to boost the cpu frequency when needed) the irqbalance integration makes it perform better irq balancing decisions than without the interface with thermald. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c9 --- Comment #9 from Michael Hirmke <opensuse@mike.franken.de> --- (In reply to Dirk Mueller from comment #8)
(In reply to Michael Hirmke from comment #7)
Ok, this machine has an Intel cpu. What would be the advantage installing thermad additionally?
It orchestrates power management on the intel cpu family systems, which both reduce power consumption as well as improve performance (because the CPU has due to better power management more headroom to boost the cpu frequency when needed)
the irqbalance integration makes it perform better irq balancing decisions than without the interface with thermald.
Thx for the explanation. I installed thermald and started it, but irqbalance nevertheless gets deactivated by systemd. systemd[1]: Started irqbalance daemon. /usr/sbin/irqbalance[1248]: thermal: received group id (3). /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. /usr/sbin/irqbalance[1248]: thermal: no CPU capacity change. systemd[1]: irqbalance.service: Deactivated successfully. What am I missing here? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c10 --- Comment #10 from Michael Hirmke <opensuse@mike.franken.de> --- thermald logs this: thermald[23396]: 27 CPUID levels; family:model:stepping 0x6:8c:1 (6:140:1) thermald[23396]: 27 CPUID levels; family:model:stepping 0x6:8c:1 (6:140:1) thermald[23396]: Thermal DTS: No coretemp sysfs found thermald[23396]: sensor id 9 : No temp sysfs for reading raw temp thermald[23396]: sensor id 9 : No temp sysfs for reading raw temp thermald[23396]: sensor id 9 : No temp sysfs for reading raw temp thermald[23396]: Unsupported condition 57 (UKNKNOWN) thermald[23396]: Unsupported condition 57 (UKNKNOWN) thermald[23396]: Unsupported condition 57 (UKNKNOWN) thermald[23396]: Unsupported condition 57 (UKNKNOWN) thermald[23396]: Unsupported condition 57 (UKNKNOWN) thermald[23396]: Unsupported condition 57 (UKNKNOWN) thermald[23396]: Unsupported condition 57 (UKNKNOWN) thermald[23396]: Unsupported condition 57 (UKNKNOWN) thermald[23396]: Unsupported conditions are present thermald[23396]: Polling mode is enabled: 4 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 Jeffrey Cheung <jcheung@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 Thomas Renninger <trenn@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |marc.ruehrschneck@suse.com Flags| |needinfo?(marc.ruehrschneck | |@suse.com) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c13 --- Comment #13 from Thomas Renninger <trenn@suse.com> --- @Michael Hirmke please stay tuned, hopefully the author of the recent changes is going to help. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 Marc Ruehrschneck <marc.ruehrschneck@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(marc.ruehrschneck | |@suse.com) | -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c15 Dirk Mueller <dmueller@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |opensuse@mike.franken.de Flags| |needinfo?(opensuse@mike.fra | |nken.de) --- Comment #15 from Dirk Mueller <dmueller@suse.com> --- Please add output of "irqbalance -d --foreground" (running as root) to this bugreport. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c16 Michael Hirmke <opensuse@mike.franken.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(opensuse@mike.fra | |nken.de) | --- Comment #16 from Michael Hirmke <opensuse@mike.franken.de> --- Created attachment 862972 --> https://bugzilla.suse.com/attachment.cgi?id=862972&action=edit irqbalance -d --foreground -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c17 Chang Seok Bae <chang.seok.bae@intel.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |chang.seok.bae@intel.com --- Comment #17 from Chang Seok Bae <chang.seok.bae@intel.com> --- Hello, First, I have to clarify that thermald has nothing to do with this event handling. I gave a pull request to support Netlink events from the kernel: https://github.com/Irqbalance/irqbalance/pull/206 Then, this commit was followed to switch errors to warnings: https://github.com/Irqbalance/irqbalance/commit/febe697ac3216b14397a60937dd4... But this line change made confusion to folks: rc = genl_connect(sock); if (rc) { - log(TO_ALL, LOG_ERR, "thermal: socket bind failed.\n"); + log(TO_ALL, LOG_INFO, "thermal: socket bind failed, thermald may not be running.\n"); return TRUE; } This is not true. It has nothing to do with "thermald". The thermal events come from the kernel with CONFIG_INTEL_HFI_THERMAL=y Let me try to remove that confusing sentence. Then, so far, the issue seems to be on systemd for the reason why it shutdown irqbalance, right? So I would concur that more information is needed as people suggested. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c18 Chang Seok Bae <chang.seok.bae@intel.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(opensuse@mike.fra | |nken.de) --- Comment #18 from Chang Seok Bae <chang.seok.bae@intel.com> --- Okay, I missed the attachment. But, it does not show any warning except for "Daemon couldn't be bound to the file-based socket." This came from the UI thing and the fallback seems to be okay. Then, there was no issue with that foreground running, correct? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c19 --- Comment #19 from Chang Seok Bae <chang.seok.bae@intel.com> ---
Let me try to remove that confusing sentence. Done: https://github.com/Irqbalance/irqbalance/pull/251
-- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205308 https://bugzilla.suse.com/show_bug.cgi?id=1205308#c20 --- Comment #20 from Michael Hirmke <opensuse@mike.franken.de> --- (In reply to Chang Seok Bae from comment #18)
Okay, I missed the attachment.
But, it does not show any warning except for "Daemon couldn't be bound to the file-based socket."
This came from the UI thing and the fallback seems to be okay.
Then, there was no issue with that foreground running, correct?
If you meant with "no issue", the daemon kept running, then yes, there was no issue. It ran until I killed it with Ctrl-C. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com