[Bug 259992] New: kernel acpi read wrong temperature - critical shutdown
https://bugzilla.novell.com/show_bug.cgi?id=259992 Summary: kernel acpi read wrong temperature - critical shutdown Product: openSUSE 10.2 Version: Final Platform: x86-64 OS/Version: SuSE Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: shanti@mojo.cc QAContact: qa@suse.de Hi regulary my system shutdown on a "wrong" temperature-alerm from ACPI:
<<<<>>>> Mar 6 14:01:20 zion kernel: ACPI: Critical trip point Mar 6 14:01:20 zion kernel: Critical temperature reached (80 C), shutting down. Mar 6 14:01:20 zion kernel: ACPI: Unable to turn cooling device [ffff810037fdd290] 'on' Mar 6 14:01:20 zion shutdown[15861]: shutting down for system halt Mar 6 14:01:21 zion powersaved[3500]: WARNING (checkTemperatureStateChanges:218) Temperature state changed to critical. Mar 6 14:01:26 zion kernel: Critical temperature reached (33 C), shutting down. <<<<<>>>>>>
as you can see "critical temperature" restores to normal (delta47°) within 7 seconds .. there is no turning back of the system .. no annoying-level - no margin .. just a clean shutdown i am nor even sure what cooling device the kernel means .. i have sensors for all cpu&fan and mainboard .. maybe it means my VGA-device or my southbridge :-o ?? please take a look :-) tnx RFC -c- -------------------------- this is my lspci: 00:00.0 Host bridge: ATI Technologies Inc RS480 Host Bridge (rev 01) 00:01.0 PCI bridge: ATI Technologies Inc RS480 PCI Bridge 00:06.0 PCI bridge: ATI Technologies Inc RS480 PCI Bridge 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 PCI bridge: ALi Corporation M5249 HTT to PCI Bridge 00:1c.0 USB Controller: ALi Corporation USB 1.1 Controller (rev 03) 00:1c.1 USB Controller: ALi Corporation USB 1.1 Controller (rev 03) 00:1c.2 USB Controller: ALi Corporation USB 1.1 Controller (rev 03) 00:1c.3 USB Controller: ALi Corporation USB 2.0 Controller (rev 01) 00:1d.0 Audio device: ALi Corporation High Definition Audio/AC'97 Host Controller 00:1e.0 ISA bridge: ALi Corporation PCI to LPC Controller (rev 31) 00:1e.1 Bridge: ALi Corporation M7101 Power Management Controller [PMU] 00:1f.0 IDE interface: ALi Corporation M5229 IDE (rev c7) 00:1f.1 RAID bus controller: ALi Corporation ULi 5287 SATA (rev 02) 01:05.0 VGA compatible controller: ATI Technologies Inc RS480 [Radeon Xpress 200G Series] 02:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5789 Gigabit Ethernet PCI Express (rev 11) 03:15.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80) 03:16.0 RAID bus controller: Triones Technologies, Inc. HPT302/302N (rev 02) <<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>> and my lsmod <<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>> Module Size Used by usblp 32128 0 appletalk 74736 2 ax25 99068 2 ipx 65096 2 p8023 19072 1 ipx xt_tcpudp 20352 14 xt_pkttype 18816 5 ipt_LOG 23808 12 xt_limit 20224 12 fglrx 790276 58 vmnet 76720 9 vmmon 166252 0 rfcomm 75688 2 hidp 50688 2 l2cap 58624 10 rfcomm,hidp it87 42404 0 af_packet 57356 0 hwmon_vid 19584 1 it87 hwmon 20360 1 it87 i2c_isa 23040 1 it87 bluetooth 90116 5 rfcomm,hidp,l2cap snd_pcm_oss 71680 0 snd_mixer_oss 35840 1 snd_pcm_oss snd_seq 82976 0 cpufreq_conservative 25608 0 cpufreq_ondemand 24592 1 cpufreq_userspace 24064 0 cpufreq_powersave 18688 0 powernow_k8 32416 1 freq_table 22912 1 powernow_k8 thermal 33552 1 processor 53864 2 powernow_k8,thermal button 24736 0 battery 28296 0 ac 22792 0 ipt_REJECT 22528 3 xt_state 19200 12 iptable_mangle 19840 0 iptable_nat 24964 0 ip_nat 37804 1 iptable_nat iptable_filter 19968 1 ip6table_mangle 19456 0 ip6table_filter 19840 0 ip_conntrack 78372 3 xt_state,iptable_nat,ip_nat nfnetlink 24648 2 ip_nat,ip_conntrack ip_tables 39784 3 iptable_mangle,iptable_nat,iptable_filter ip6_tables 33480 2 ip6table_mangle,ip6table_filter x_tables 37384 9 xt_tcpudp,xt_pkttype,ipt_LOG,xt_limit,ipt_REJECT,xt_state,iptable_nat,ip_tables,ip6_tables uhci_hcd 42520 0 apparmor 74264 0 aamatch_pcre 31232 1 apparmor reiserfs 260096 1 loop 34192 0 dm_mod 81872 0 ohci1394 52040 0 ieee1394 130552 1 ohci1394 snd_usb_audio 108672 1 snd_usb_lib 36224 1 snd_usb_audio snd_rawmidi 47104 1 snd_usb_lib snd_seq_device 26516 2 snd_seq,snd_rawmidi snd_hwdep 28552 1 snd_usb_audio usbhid 69792 0 ide_cd 59680 0 cdrom 54056 1 ide_cd i2c_ali15x3 25348 0 tg3 125572 0 ohci_hcd 38404 0 shpchp 56492 0 snd_hda_intel 37660 2 snd_hda_codec 220160 1 snd_hda_intel snd_pcm 115464 4 snd_pcm_oss,snd_usb_audio,snd_hda_intel,snd_hda_codec snd_timer 44680 2 snd_seq,snd_pcm snd 89384 18 snd_pcm_oss,snd_mixer_oss,snd_seq,snd_usb_audio,snd_usb_lib,snd_rawmidi,snd_seq_device,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer soundcore 28192 1 snd snd_page_alloc 27792 2 snd_hda_intel,snd_pcm i2c_ali1535 24452 0 i2c_core 41472 4 it87,i2c_isa,i2c_ali15x3,i2c_ali1535 pci_hotplug 52228 1 shpchp floppy 82408 0 parport_pc 58984 1 lp 30664 0 parport 59660 2 parport_pc,lp ext3 167696 2 mbcache 27016 1 ext3 jbd 90872 1 ext3 ehci_hcd 51080 0 usbcore 148320 7 usblp,uhci_hcd,snd_usb_audio,snd_usb_lib,usbhid,ohci_hcd,ehci_hcd edd 27912 0 fan 22408 1 sg 55080 0 hpt366 36992 0 [permanent] sata_uli 25860 4 libata 145312 1 sata_uli alim15x3 29208 0 [permanent] sd_mod 39296 4 scsi_mod 173744 3 sg,libata,sd_mod ide_disk 34304 2 ide_core 174720 4 ide_cd,hpt366,alim15x3,ide_disk <<<<<<<<<<<<<<>>>>>>>>>>>>>> -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #1 from shanti@mojo.cc 2007-04-03 05:50 MST ------- Created an attachment (id=128495) --> (https://bugzilla.novell.com/attachment.cgi?id=128495&action=view) detailed systeminfo some more sysinfo -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 shanti@mojo.cc changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 shanti@mojo.cc changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |shanti@mojo.cc ------- Comment #2 from shanti@mojo.cc 2007-04-07 12:09 MST ------- interesting maybe: my Bios show an additional fan sensor for the cpu-cooling device , but the linux-module (rs480-mainboard) doesnt show this "fan3" value , with is the integrated smartfan of the CPU ( normally about 800-900 rpms as shown in BIOS ) .. this could be the missing "cooling device" that is not reachable .. must be a missing feature of the RS480-mainboard driver just FYI best regards -c- -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 shanti@mojo.cc changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P2 - High |P1 - Urgent ------- Comment #4 from shanti@mojo.cc 2007-05-02 15:49 MST ------- WOOOOHAAA ! happens again May 2 23:07:57 zion kernel: ACPI: Critical trip point May 2 23:07:57 zion kernel: Critical temperature reached (110 C), shutting down. May 2 23:07:57 zion shutdown[31489]: shutting down for system halt May 2 23:07:57 zion powersaved[3611]: WARNING (checkTemperatureStateChanges:218) Temperature state changed to critical. May 2 23:07:57 zion init: Switching to runlevel: 0 May 2 23:07:59 zion kernel: Critical temperature reached (41 C), shutting down. May 2 23:08:01 zion kernel: md: stopping all md devices. i think that the i2c_ali15x3,i2c_ali1535-modules are buggy .. they at least miss on active fan. At least this shows that the driver is incomplete .. i caught this up(in a boot-log): <4>ali1535_smbus 0000:00:1e.1: ALI1535_smb region uninitialized - upgrade BIOS? <4>ali1535_smbus 0000:00:1e.1: ALI1535 not detected, module not inserted. <3>ali15x3_smbus 0000:00:1e.1: ALI15X3_smb region uninitialized - upgrade BIOS or use force_addr=0xaddr <3>ali15x3_smbus 0000:00:1e.1: ALI15X3 not detected, module not inserted. i think this might have some issue in common rfc -c- -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cadaha@novell.com, pavel@novell.com Status|NEW |ASSIGNED ------- Comment #5 from trenn@novell.com 2007-05-03 02:42 MST ------- Yep, the temp looks bogus. It may be that something interferes with ACPI causing this.
i think that the i2c_ali15x3,i2c_ali1535-modules are buggy .. they at least miss on active fan. At least this shows that the driver is incomplete .. Those could likely interfere with ACPI, better don't use them. There were some patches to unhide smbus to use it with such legacy modules, which might conflict with ACPI.
but the linux-module (rs480-mainboard) What's that? I can't find this module?
I'd like to close this one invalid if without above modules this does not happen any more. Some more questions: How often does/did it happen? frequently? These modules do not load automatically? What is the machine's vendor/model name? Adding Pavel and Daniel, they might be interested in this. Daniel already had a fix with this smbus (un)hiding, any ideas/comments? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #6 from shanti@mojo.cc 2007-05-03 04:16 MST ------- these modules load automatically .. if i remove them i have no hardware-sensors :-( .. i think that at least the instance that checks temperature should have more intelligence .. if temperature drops to noncritical within 5 seconds a shutdown should be overridden :-o :-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #7 from shanti@mojo.cc 2007-05-03 06:56 MST ------- i blacklisted the modules for now (i2c_ali15x3,i2c_ali1535) , hope it will bring some improvement .. yet
but the linux-module (rs480-mainboard) What's that? I can't find this module?
i meant the regarding ali*-modules, since they fir for this mainboard .. my hardware is a shuttle st20g5 (IPX-Board) http://global.shuttle.com/Product/Barebone/ST20G5.asp the specs tell its a ULi 1573 mainboard and yes indeed it happens frequently ( once a week ) .. after rebooting and going straight for the PC-HEalth in the bios , the temperatures are quite ok ( under 50°C ) .. i got the latest available bios-version for this board please gimme 2 weeks testing period before closing this ticket :-/ tnx -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 jsj@jsj.dyndns.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jsj@jsj.dyndns.org ------- Comment #8 from jsj@jsj.dyndns.org 2007-05-09 02:06 MST ------- I am having a MSI MegaPC 865 barebone. I saw this behaviour earlier from time to time with 10.0, but thought, it had to do with temperature indeed - and bad cooling. I updated the system to 10.2 a while ago, everything went smooth. Last Saturday I installed the actual kernel update and now the system won't run for longer than two minutes due to the emergency shutdown. I switched back to the former kernel and everything runs smooth again. What is interesting, is that my triggering temperature is WAY OFF! These are always different, but most of the time negative numbers like -14537. When I am at home today, I will post examples from syslog. What else information can I provide? dmidecode? hwinfo? Ah yes, a MS Windows XP runs without problems. The problem seems to exist for a longer time, as you find longer threads of such behaviour using google (about 1 3/4 year) - without a solution. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #9 from shanti@mojo.cc 2007-05-09 05:22 MST -------
I'd like to close this one invalid if without above modules this does not happen any more.
hmmm .. i would not, because after latest Kernel-update-RPM from Suse , these modules are back in play .. and no more blacklisted .. so i suggest that Modules that are so unclearly incomplete shouldnt find their way into standard configuration after beeing blacklisted before .. i will open a new bug for this , because its a little offtopic anyway since latest kernel-patch i was happily found running my computer for 24hours (!!!!) without crashing or shoutting down .. but gimme one more week for testing :-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #10 from trenn@novell.com 2007-05-09 05:39 MST ------- This is all very machine specific! Please don't mix up things (e.g. HPs also had crit shutdown problems, but that is totally unrelated!). Please check for i2c whatever sensors related modules and blacklist them move them away or whatever. If you are sure you identified an offending module please post it's name with some info what you've done to test it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #11 from shanti@mojo.cc 2007-05-09 05:58 MST ------- i now blacklist : blacklist it87 blacklist i2c_ali15x3 blacklist i2c_ali1535 should i blacklist i2c_isa and i2c_core as well ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #12 from trenn@novell.com 2007-05-09 06:26 MST ------- Probably a good idea. As said, I don't know much about these modules, but AFAIK they are mostly used for sensors (fan/thermal reads which should only be done by ACPI). If you lack any functionality, pls let us know. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #13 from shanti@mojo.cc 2007-05-09 09:53 MST ------- well i think that this message points out a blind spot in the driver . i may be wrong: <4>ali1535_smbus 0000:00:1e.1: ALI1535_smb region uninitialized - upgrade BIOS? <4>ali1535_smbus 0000:00:1e.1: ALI1535 not detected, module not inserted. <3>ali15x3_smbus 0000:00:1e.1: ALI15X3_smb region uninitialized - upgrade BIOS or use force_addr=0xaddr <3>ali15x3_smbus 0000:00:1e.1: ALI15X3 not detected, module not inserted. maybe on top of this missing part in the driver its not working as it should regards -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #14 from shanti@mojo.cc 2007-05-26 03:56 MST ------- well now i disabled this module (it87) i have var/log/messages full of this (what makes me nervous ) ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' . of course the mainboards sensors are unshown now , i would love to have them again . i am not sure but doesnt this message says that there is a part of my hardware the kernel doesnt recognise ... this must be my southbridge-fan i guess i BIOS i know its running and that the Temperature is fine :-( -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jdelvare@novell.com Status|ASSIGNED |NEEDINFO Info Provider| |shanti@mojo.cc ------- Comment #15 from trenn@novell.com 2007-05-29 06:23 MST ------- Yes, looks scary, but could be harmless, if e.g. BIOS exports trip point table(s), but the fans are still handled by BIOS. I more expect wrong/weird EC values (110 C and 80 C values should be simply wrongly reported), than real fan failure. You may want to monitor this a bit: watch -n1 cat /proc/acpi/fan/*/* /proc/acpi/thermal_zone/*/{temperature,trip_points} (Also use your ears for monitoring :), if fan control is done by BIOS, you can at least guess from temperature and fan activity if things seem to work normal). To increase temp, you can e.g. do: "cat /dev/zero >/dev/null &" (this should keep one core busy and let increase temperature or fan activity fast. Do you still get the critical shutdowns with the smbus accessing module disabled? You should also be able to trigger this bug more often if you set THERMAL_POLLING_FREQUENCY="1" in /etc/powersave/thermal and /etc/sysconfig/powersave/thermal and restart powersaved (rcpowersaved restart). You should then be able to verify the critical shutdown more easily (only if the smbus module is loaded?). Jean Delvare has some ideas about getting legacy sensor modules and ACPI working together -> adding to CC. Christoph: If you don't see critical shutdowns anymore without legacy smbus modules, I'd like to close this one (hmm, it's worth to still make sure not loading them per default for 10.2, maybe Daniel or Jean could give me a hint how to do that, the modules should also taint the kernel IMO). Christoph: It would also be very interesting to see how this machine behaves with latest 10.3 Alpha (it's the 4th AFAIK). If you still have a free partition on your machine, it would be great to give it a test. Now we still can fix things, as soon as 10.3 is released it's getting difficult. Christoph: Could you also attach acpidump output please. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #16 from shanti@mojo.cc 2007-05-29 07:12 MST ------- Created an attachment (id=142653) --> (https://bugzilla.novell.com/attachment.cgi?id=142653&action=view) my acpidump -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #17 from shanti@mojo.cc 2007-05-29 07:24 MST ------- cat /proc/acpi/fan/FAN/state status: on cat /proc/acpi/thermal_zone/THRM/temperature temperature: 46 C cat /proc/acpi/thermal_zone/THRM/trip_points critical (S5): 60 C passive: 50 C: tc1=4 tc2=3 tsp=60 devices=0xffff810037fe72b0 active[0]: 50 C: devices=0xffff810037fde290 this fan is the CPU which is set to "SmartFan" in BIOS .. i also have ksensors running. Since i cannot monitore northbridge/Systemfan anymore (nomore it87 and related) this is not optimal i know when this shutdown occured there was definitly NO problem with temperature at all , silent smooth and cool my system is. ATM i am running `cat /dev/zero >/dev/null` on both core (AMDx2) generating full utilaziation on the CPUs for 15 minutes and my /proc/acpi/thermal_zone/THRM/temperature doesnt even raise more than 2° , even fan is still working rather silently. My point was only if there shouldnt be a workaround in the kernel or the module or the powerwatch to concern the mistaken panic and at least wait for next result ( like a UPS would care about powerfailure ) .. as you see , this result are meaningless after the next poll ( after a few seconds ) .. now i will change to modprobed it87 and lett you know again :-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #18 from jdelvare@novell.com 2007-05-29 07:25 MST ------- With regards to the synchronization mechanism between ACPI and non-ACPI drivers, some proposals have been made, but we don't have anything ready for production yet, and I have higher priority tasks at the moment. The i2c-ali15x3 and i2c-ali1535 drivers aren't the cause of the reporter's problem. The message in the logs clearly show that the BIOS did not properly initialize the SMBus device, so the Linux drivers can't use it. You can blacklist these drivers to prevent the messages in the logs, but this won't make any difference otherwise. There's no point in blacklisting i2c-core nor i2c-isa either, as these are supporting drivers which do not access the hardware. The driver which may be conflicting with ACPI on your system would rather be it87. This driver does _not_ autoload, so rather than blacklisting it, you should delete /etc/sysconfig/lm_sensors. If it makes the problem go away, then it would confirm that this was a driver conflict. Alternatively, you could blacklist the (ACPI) thermal and fan drivers. After all, the it87 driver gives you much more information than the ACPI drivers. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #19 from shanti@mojo.cc 2007-05-29 07:58 MST ------- /etc/sysconfig/lm_sensors suggest me also that it might not be an issue width the it87 alone or neither .. MODULE_0=k8temp #<- :-( ?? MODULE_1=it87 shanti@zion:~> lsmod |grep k8 powernow_k8 32416 1 freq_table 22912 1 powernow_k8 processor 53864 2 powernow_k8,thermal shanti@zion:~> lsmod |grep i2 i2c_ali15x3 25348 0 i2c_isa 23040 1 it87 i2c_core 41472 3 i2c_ali15x3,it87,i2c_isa i remember from some gentoo-forums in 2005 ( when i started with this PC ) that that time there hab been issues with powernow_k8 and AMD64/SMP CPUs, but after fixing this ( came in kernelupdates ) there was no more thought on this this shutdown issue also first touched me with OPENSUSE10.2 .. never had some of this in my life ( even in custom-kernels ) since RPM says "file /etc/sysconfig/lm_sensors is not owned by any package" i will comment those lines, bring back it87 and proceed :-) PS: kernel: ali15x3_smbus 0000:00:1e.1: ALI15X3 not detected, module not inserte d. so i guess i should use ali1563 instead , albeit my hardware is a 00:1e.1 Bridge: ALi Corporation M7101 Power Management Controller [PMU] ??? tnx4support -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #20 from jdelvare@novell.com 2007-05-29 08:14 MST ------- (In reply to comment #19)
/etc/sysconfig/lm_sensors suggest me also that it might not be an issue width the it87 alone or neither .. MODULE_0=k8temp #<- :-( ?? MODULE_1=it87
The k8temp driver is new in kernel 2.6.19 so you don't have it in openSuse 10.2.
since RPM says "file /etc/sysconfig/lm_sensors is not owned by any package"
i will comment those lines, bring back it87 and proceed :-)
The file /etc/sysconfig/lm_sensors is generated by sensors-detect.
PS: kernel: ali15x3_smbus 0000:00:1e.1: ALI15X3 not detected, module not inserted. so i guess i should use ali1563 instead , albeit my hardware is a 00:1e.1 Bridge: ALi Corporation M7101 Power Management Controller [PMU] ???
No. Your hardware is really supported by either the i2c-ali15x3 driver or the i2c-ali1535 driver (can't tell which, as ALi unfortunately used the same PCI device ID for both) and not the i2c-ali1563. But the BIOS did not properly map the SMBus function so it cannot be used. Anyway, you don't really care about the SMBus, so you can simply ignore it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #21 from jdelvare@novell.com 2007-05-29 09:41 MST ------- I disassembled the DSDT and took a look. I am no ACPI expert but it is clear that ACPI is accessing a device at 0x295-0x296, which is the default address of the IT87xxF hardware monitoring function. This confirms that you should be using either thermal+fan, or it87, but not both. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|shanti@mojo.cc | ------- Comment #22 from trenn@novell.com 2007-05-29 11:43 MST ------- Jean, IMO we have to set sensor modules to unsupported and must somehow make sure they get not loaded (for ACPI archs), but still let people a chance to load them manually. How are those modules loaded? Can you come up with a list of modules that access smbus/i2c bus and could interfere with ACPI, pls. How does the modules get loaded? I can't find any autoloading in kernel (quick look, I might have overseen something), does this work that you run some userspace hwmon-test app, that one writes /etc/sysconfig/lm_sensors with suggestions which modules to load via /etc/init.d/hwmon (re-)start? Jean, I did a quick check of the DSDT, this machine is hopeless to run ACPI and it87 module. This functions all access the device: - SFAN, FON, FOFF, RTMP, STHY, STOS, SCFG It also looks like the it87 addresses seem not to be used by default, but I would not trust this assumption. Next thing is, that the above functions are all in _SI scope, but are not assigned to a specific ACPI device. That means writting an ACPI driver for it87 could get difficult and all this looks very machine/BIOS/vendor specific... (on the other hands side I am sure I already have seen the SFAN method, looks like one need a acer-acpi module including this or it could be added to asus-acpi or whatever machine this is, what kind of machine/model is this?) IMO it87 module must vanish anyway (or must not get loaded for ACPI machines) and we need something ACPI (probably also BIOS/vendor/model) specific. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #23 from jdelvare@novell.com 2007-05-30 02:51 MST ------- (In reply to comment #22)
Jean, IMO we have to set sensor modules to unsupported and must somehow make sure they get not loaded (for ACPI archs), but still let people a chance to load them manually. How are those modules loaded?
The hardware monitoring drivers are already all unsupported.
Can you come up with a list of modules that access smbus/i2c bus and could interfere with ACPI, pls.
Virtually all of them can. Note that the it87 driver doesn't even touch smbus/i2c in this case. So the list you want is all smbus/i2c master drivers _and_ all non-i2c-based hardware monitoring drivers, limited to devices which can be found on ACPI-enabled systems: i2c-ali1535 i2c-ali1563 i2c-ali15x3 i2c-amd756 i2c-amd756-s4882 i2c-amd8111 i2c-i801 i2c-nforce2 i2c-piix4 i2c-sis5595 i2c-sis630 i2c-sis96x i2c-viapro abituguru f71805f hdaps it87 k8temp pc87360 pc87427 sis5595 smsc47b397 smsc47m1 via686a vt1211 vt8231 w83627ehf w83627hf That's a pretty long list, isn't it? Note: of these, the smsc47m1, vt8231 and via686a are probably less risky than the others because their access is stateless. So even if ACPI is accessing these devices, the drivers shouldn't get in the way. Shouldn't...
How does the modules get loaded? I can't find any autoloading in kernel (quick look, I might have overseen something), does this work that you run some userspace hwmon-test app, that one writes /etc/sysconfig/lm_sensors with suggestions which modules to load via /etc/init.d/hwmon (re-)start?
The only hardware monitoring modules which autload are k8temp, sis5595, via686a and vt8231, because they are PCI devices. All SMBus master drivers listed above autoload too, again because they are PCI devices. For the other hardware monitoring drivers, they are loaded by /etc/rc.d/lm_sensors based on the list found in /etc/sysconfig/lm_sensors. This configuration file is generated by /usr/sbin/sensors-detect. One thing that could be done would be to add an ACPI check in sensors-detect, so that we can warn the user that a conflict could happen. However, only checking for the presence of ACPI would result in many false positives. So ideally we would need a way to determine if a given DSDT contains functions which access the I/O ports of the devices detected by sensors-detect. But I guess this would be pretty hard to automatize. Another (hopefully temporary) approach is to add DMI-based blacklists to individual SMBus and hardware monitoring drivers. If the list of motherboards with a conflict is small enough, it might work. If not, it might become quickly unmaintainable :(
Jean, I did a quick check of the DSDT, this machine is hopeless to run ACPI and it87 module.
I agree.
This functions all access the device: - SFAN, FON, FOFF, RTMP, STHY, STOS, SCFG
Are these functions called if neither the "fan" nor "thermal" drivers are loaded? My guess is that Christoph would rather use the it87 driver than the less featured ACPI drivers.
It also looks like the it87 addresses seem not to be used by default, but I would not trust this assumption.
I don't understand what you mean here. Can you please explain?
Next thing is, that the above functions are all in _SI scope, but are not assigned to a specific ACPI device. That means writting an ACPI driver for it87 could get difficult and all this looks very machine/BIOS/vendor specific... (on the other hands side I am sure I already have seen the SFAN method, looks like one need a acer-acpi module including this or it could be added to asus-acpi or whatever machine this is, what kind of machine/model is this?)
Please remember I am no ACPI expert. What is the "_SI scope"?
IMO it87 module must vanish anyway (or must not get loaded for ACPI machines) and we need something ACPI (probably also BIOS/vendor/model) specific.
ACPI was supposed to be a standard, and now I seem to understand that we would need to write a dedicated driver for every motherboard vendor, or maybe even every motherboard model out there? *sigh* The it87 driver is supporting the ITE IT87xxF/FG chips on _all_ motherboard models. But sure, just do it. The problem right now is that the ACPI people keep complaining that non-ACPI hardware monitoring drivers are causing trouble, but users keep using them because ACPI doesn't offer anything next to what lm-sensors has been providing for years. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #24 from jsj@jsj.dyndns.org 2007-05-30 03:38 MST ------- On my machine (MSI MegaPC 865 barebone) the module smsc47m1 seems to be the cause of the trouble, contrary to the assumption, that this module does not affect the acpi value reading. The difference between the older and the newer kernels in loaded modules is lm90, hwmon and smsc47m1, where the critical shutdowns do not happen, when smsc47m1 is not loaded. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #25 from jdelvare@novell.com 2007-05-30 04:13 MST ------- Stefan, please attach your acpidump. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 jdelvare@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |jsj@jsj.dyndns.org -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 jsj@jsj.dyndns.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|jsj@jsj.dyndns.org | ------- Comment #26 from jsj@jsj.dyndns.org 2007-05-30 11:31 MST ------- Created an attachment (id=142976) --> (https://bugzilla.novell.com/attachment.cgi?id=142976&action=view) acpidump of MSI MegaPC 865 barebone Here you are... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 jdelvare@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #142976|application/octet-stream |text/plain mime type| | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #27 from jdelvare@novell.com 2007-05-31 04:05 MST ------- Stefan, I checked your DSDT table and it seems that your ACPI implementation includes a fairly complete fan speed control mechanism. It is setting the fan speed using the SMSC LPC47M1xx PWM outputs based on the temperatures readings from a chip on the SMBus at address 0x2d. This can't possibly be an LM90-compatible chip, as these live at 0x4c or 0x4d. You must have a 3rd hardware monitoring driver which you didn't list. Could be smsc47m192 (which is NOT the same as smsc47m1)? Or some LM85-compatible device. Anyway, I double-checked the smsc47m1 driver, the device has a flat I/O space and the driver doesn't even write to it by default so I just can't see how it would interact with ACPI, especially not with temperatures as the smsc47m1 driver only deals with fans. There could be some unexpected interaction if you tried to control the fan by yourself (using fancontrol), but no invalid temperature reads as you have been seeing. Given the ACPI implementation your system has, I recommend that you do not load non-ACPI hardware monitoring drivers nor SMBus master drivers. The ACPI stuff should work just fine for you. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
so I just can't see how it would interact with ACPI, especially not with temperatures as the smsc47m1 driver only deals with fans There also exists a little micro controller called EC (Embedded Controller)
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #28 from trenn@novell.com 2007-05-31 04:27 MST ------- that makes it even easier to program AML/ASL code for developers. This one has its own firmware, 256 byte registers and could pre-process info like fan speed, temp, even control fan speed depending on temps (as done on recent ThinkPads). You cannot see which addresses/busses this one accesses (AFAIK it also access i2c and smbus) and it's very likely that the EC got confused by sensor module reads/writes. E.g. we had EC confusion because of psmouse driver interference, the EC also accesses super I/O chipset (even this probably was a EC firmware bug, I just want to show that things can be complex...). Jean, do you think one can write an acpi-hwmon driver with such provided ACPI functions to read/write fan and temp? If yes, I expect we need something similar as it's done in asus_acpi (There at least always the same ACPI device (ATK..) existed, but ACPI functions were named differently from machine to machine). What we would need the is something like whitelisting machines via e.g. DSDT table id, or something better and then assign ACPI funcs to hwmon driver where we know what they do, e.g. like: if (match_xy_model()) acpi_fan_on_acpi_method="\_SI/FON"; elsif (match_yz_model()) acpi_fan_on_acpi_method="\scope_xy/XFON"; .. acpi_fan_on(){ evaluate_acpi_object(.., acpi_fan_on_acpi_method,..); } struct driver_hwmon acpi-hwmon; acpi-hwmon->fan_on_callback = acpi_fan_on; -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |jdelvare@novell.com ------- Comment #29 from trenn@novell.com 2007-05-31 04:29 MST ------- Back to the bug, are there any offending modules loaded automatically now? Do we need to do something? I'd like to close or rename this one (enhancement -> implement hwmon acpi drivers :) ). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #30 from jdelvare@novell.com 2007-05-31 06:20 MST ------- (In reply to comment #28)
Jean, do you think one can write an acpi-hwmon driver with such provided ACPI functions to read/write fan and temp?
Yes, but it will be difficult. There was an attempt to do this for some Asus motherboards: http://lists.lm-sensors.org/pipermail/lm-sensors/2007-May/019715.html My fear is that this will be heavily motherboard-dependent. We will have to check the DSDT of every motherboard out there to find out which AML methods do what. Not only the function names with vary from one board to the next, but also the calling convention, possible side effects and even availability of these functions. Even a simple BIOS update could break it. This is something we always tried to avoid with non-ACPI hardware monitoring drivers, because it doesn't scale at all over time and makes maintenance a nightmare. But of course, if your plan is really to get rid of all SMBus master and hardware monitoring drivers as soon as ACPI is enabled, then you will have to do something like that. You simply can't kill a functionality thousands of people have been using for years without providing something to replace it. That something will lack some (most?) of the features (for example ACPI doesn't care about voltages at all, does it?) but at the very least it must exist. Otherwise, expect receiving hundred mails from angry users every month. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 jdelvare@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|jdelvare@novell.com | ------- Comment #31 from jdelvare@novell.com 2007-05-31 06:25 MST ------- (In reply to comment #29)
Back to the bug, are there any offending modules loaded automatically now? Do we need to do something?
I already listed the potentially offending drivers which autoload, see comment #23. Basically these are all the pci drivers in the list. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #32 from jsj@jsj.dyndns.org 2007-05-31 07:15 MST ------- (In reply to comment #27)
Stefan, I checked your DSDT table and it seems that your ACPI implementation includes a fairly complete fan speed control mechanism. It is setting the fan speed using the SMSC LPC47M1xx PWM outputs based on the temperatures readings from a chip on the SMBus at address 0x2d. This can't possibly be an LM90-compatible chip, as these live at 0x4c or 0x4d. You must have a 3rd hardware monitoring driver which you didn't list. Could be smsc47m192 (which is NOT the same as smsc47m1)? Or some LM85-compatible device.
It works pretty well ;) With the update to 2.6.18.8-0.3 some modules have been loaded automatically. I did not load any monitoring modules before. With 0.1-kernel the diff of loaded modules was: i2c_isa lm90 hwmon smsc47m1
Anyway, I double-checked the smsc47m1 driver, the device has a flat I/O space and the driver doesn't even write to it by default so I just can't see how it would interact with ACPI, especially not with temperatures as the smsc47m1 driver only deals with fans. There could be some unexpected interaction if you tried to control the fan by yourself (using fancontrol), but no invalid temperature reads as you have been seeing.
"As far as I remember" blacklisting smsc47m1 solved the problems with my machine. Maybe it has been hwmon, but I am tending to the smsc47m1.
Given the ACPI implementation your system has, I recommend that you do not load non-ACPI hardware monitoring drivers nor SMBus master drivers. The ACPI stuff should work just fine for you.
So I thought, but with the update kernel they got loaded. My concern is, that this module will not be blacklisted. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #33 from jdelvare@novell.com 2007-05-31 08:08 MST ------- (In reply to comment #32)
With the update to 2.6.18.8-0.3 some modules have been loaded automatically. I did not load any monitoring modules before. With 0.1-kernel the diff of loaded modules was: i2c_isa lm90 hwmon smsc47m1
I am fairly certain this isn't true. There is no way these modules could be loaded automatically, because these hardware monitoring devices cannot be easily detected. Please check your /etc/sysconfig/lm_sensors file. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #34 from jsj@jsj.dyndns.org 2007-06-04 00:55 MST ------- You got me here. Sorry for the confusion. The funny thing is, that with 2.6.18.8-0.1 the system runs smooth, whereas with 0.3 is reads the strange temperatures. OK for now, I see the problem with the conflicting/confusing accesses to the hardware. Thank you for the help and effort! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #35 from trenn@novell.com 2007-06-05 08:22 MST ------- Created an attachment (id=144232) --> (https://bugzilla.novell.com/attachment.cgi?id=144232&action=view) Request IO and mem resources for ACPI operation regions This patch requests IO and mem resources for ACPI operation regions. Unfortunately it still clashes with some of it's own resources (see line 5000). 002e-002f : acpi* 0072-0073 : acpi* 0090-0091 : acpi* 0295-0296 : acpi* 5000-50fe : acpi* 5000-5003 : ACPI PM1a_EVT_BLK 5004-5005 : ACPI PM1a_CNT_BLK 5008-500b : ACPI PM_TMR 5010-5015 : ACPI CPU throttle 5020-5023 : ACPI GPE0_BLK 50b0-50b7 : ACPI GPE1_BLK adalid:~ # cat /proc/iomem |grep -i acpi Purpose of this patch is to still allow other drivers to request the io/mem resources and only throw a kern_err if a driver's done so. Like that we could get a picture which Operation Regions DSDT are generally using and which machines possibly might have potential problems with it. At some later time the request_resource_soft might fall away again if possible (see next patch). Then the only way to access the regions is via ACPI. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |astarikovskiy@novell.com ------- Comment #36 from trenn@novell.com 2007-06-05 08:49 MST ------- No next patch, it's just the same but does not do the _soft and further drivers trying to request such a resource will fail. Here some dmesg output of patch from comment #35, the conflicts show specific IO regions that get used for sure, parsed from fadt and also declared as SYSTEM_IO operating region: IO resource region conflicts with IO ACPI PM1a_EVT_BLK regions, conflict is ignored, system might run unstable. IO resource region conflicts with IO ACPI PM1a_CNT_BLK regions, conflict is ignored, system might run unstable. IO resource region conflicts with IO ACPI PM_TMR regions, conflict is ignored, system might run unstable. IO resource region conflicts with IO ACPI GPE0_BLK regions, conflict is ignored, system might run unstable. IO resource region conflicts with IO ACPI GPE1_BLK regions, conflict is ignored, system might run unstable. This needs some more fiddling (should just be an example patch I like to send to ACPI/hwmon list if you think it's worth it), but could be a beginning to get a picture which drivers might conflict with ACPI? When cleaned up there should a message be print when it87 is trying to be loaded: IO resource region conflicts with acpi IO region [0x295-0x296], conflict is ignored, system might run unstable. If it comes out that no important drivers are affected, we could let the drivers fail to load. Jean, I grepped over my DSDTs, here is a list of common SystemIO addresses used/claimed by DSDTs: /suse/trenn/Export/SystemIO.txt Any comments welcome... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 astarikovskiy@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #144232|0 |1 is obsolete| | ------- Comment #37 from astarikovskiy@novell.com 2007-06-06 14:31 MST ------- Created an attachment (id=144583) --> (https://bugzilla.novell.com/attachment.cgi?id=144583&action=view) Set correct name of the resource Thomas, I think it's convenient to know the name of the resource, not just acpi*. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #38 from astarikovskiy@novell.com 2007-06-07 12:42 MST ------- Created an attachment (id=144838) --> (https://bugzilla.novell.com/attachment.cgi?id=144838&action=view) DSDT from desktop ASUS mainboard Thomas, This DSDT seems to encapsulate some hwmon, please look at ASOC. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 info@tristanhoffmann.de changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |info@tristanhoffmann.de ------- Comment #39 from info@tristanhoffmann.de 2007-06-07 13:38 MST ------- Hi, I just want to add that I also have this problem since OpenSUSE 10.2 on a HP nx6125 laptop. I now have openSUSE 10.3 Alpha 3 and it still occurs. but I've never loaded hardware monitoring modules manually. /var/log/messages: Jun 7 20:14:04 turion-laptop kernel: ACPI: Critical trip point Jun 7 20:14:04 turion-laptop kernel: Critical temperature reached (7168 C), shutting down. Jun 7 20:14:05 turion-laptop shutdown[5639]: shutting down for system halt Jun 7 20:14:05 turion-laptop init: Switching to runlevel: 0 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 astarikovskiy@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |info@tristanhoffmann.de ------- Comment #40 from astarikovskiy@novell.com 2007-06-08 02:59 MST ------- Do you have anything in /etc/sysconfig/lm_sensors? Do you see problems if you remove all sensors out of this file and re-start hwmon? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #41 from jdelvare@novell.com 2007-06-08 04:56 MST ------- (In reply to comment #40)
Do you have anything in /etc/sysconfig/lm_sensors? Do you see problems if you remove all sensors out of this file and re-start hwmon?
That's not the right order. You must first stop hwmon ("/etc/rc.d/lm_sensors stop"), and then delete /etc/sysconfig/lm_sensors, otherwise the hwmon drivers will not be removed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #42 from info@tristanhoffmann.de 2007-06-08 05:48 MST ------- Well on both openSUSE 10.3 and 10.2 there is no directory or file called "lm-sensors" in /etc/sysconfig. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 info@tristanhoffmann.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|info@tristanhoffmann.de | ------- Comment #43 from info@tristanhoffmann.de 2007-06-08 05:51 MST ------- sorry, I wanted to write "lm_sensors" -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #44 from jdelvare@novell.com 2007-06-08 06:06 MST ------- (In reply to comment #42)
Well on both openSUSE 10.3 and 10.2 there is no directory or file called "lm_sensors" in /etc/sysconfig.
This means that your problem is different from the original report. Please open a separate bug report. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #45 from info@tristanhoffmann.de 2007-06-08 08:06 MST ------- Do you really think this is a different bug? Maybe it's just not lm_sensors causing this? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #46 from jdelvare@novell.com 2007-06-08 08:25 MST ------- (In reply to comment #45)
Do you really think this is a different bug? Maybe it's just not lm_sensors causing this?
Same symptoms but different cause => different bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #47 from shanti@mojo.cc 2007-06-10 11:19 MST ------- here again: #dmesg |grep ACPI there are some quirks and an invalid value for PBLK within: BIOS-e820: 0000000077ef0000 - 0000000077ef3000 (ACPI NVS) BIOS-e820: 0000000077ef3000 - 0000000077f00000 (ACPI data) ACPI: RSDP (v000 XPC ) @ 0x00000000000f7e50 ACPI: RSDT (v001 XPC AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0000000077ef3040 ACPI: FADT (v001 XPC AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0000000077ef30c0 ACPI: SSDT (v001 PTLTD POWERNOW 0x00000001 LTP 0x00000001) @ 0x0000000077ef74c0 ACPI: HPET (v001 XPC AWRDACPI 0x42302e31 AWRD 0x00000098) @ 0x0000000077ef76c0 ACPI: MCFG (v001 XPC AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0000000077ef7740 ACPI: MADT (v001 XPC AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0000000077ef7400 ACPI: DSDT (v001 XPC ST20V10 0x00001000 MSFT 0x0100000e) @ 0x0000000000000000 ACPI: PM-Timer IO Port: 0x4008 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. ACPI: HPET id: 0x10b9a201 base: 0xfed00000 Using ACPI (MADT) for SMP configuration information ACPI: Core revision 20060707 ACPI: bus type pci registered ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 PCI quirk: region 4000-403f claimed by ali7101 ACPI ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P2PB._PRT] ACPI: PCI Interrupt Link [LNK1] (IRQs 1 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNK2] (IRQs 1 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK3] (IRQs 1 3 4 5 6 7 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNK4] (IRQs 1 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNK5] (IRQs 1 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNK6] (IRQs 1 3 4 5 6 *7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK7] (IRQs 1 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNK8] (IRQs 1 3 4 5 6 7 10 11 12 14 15) *9 ACPI: PCI Interrupt Link [LNK9] (IRQs 1 *3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGP_._PRT] PCI: Using ACPI for IRQ routing ACPI: (supports S0 S1 S4 S5) ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 19 (level, low) -> IRQ 185 ACPI: Fan [FAN] (on) ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 23 (level, low) -> IRQ 193 ACPI: PCI Interrupt 0000:00:1f.0[A] -> GSI 19 (level, low) -> IRQ 185 ACPI: PCI Interrupt 0000:00:1d.0[C] -> GSI 21 (level, low) -> IRQ 201 ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 18 (level, low) -> IRQ 209 ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 18 (level, low) -> IRQ 209 ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 209 ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 18 (level, low) -> IRQ 209 ACPI: PCI Interrupt 0000:03:16.0[A] -> GSI 18 (level, low) -> IRQ 209 ACPI: PCI Interrupt 0000:03:15.0[A] -> GSI 17 (level, low) -> IRQ 217 ACPI: Power Button (FF) [PWRF] ACPI: Power Button (CM) [PWRB] ACPI: Invalid PBLK length [7] ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Thermal Zone [THRM] (50 C) ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: Unable to turn cooling device [ffff810037fde290] 'on' ACPI: PCI Interrupt 0000:01:05.0[A] -> GSI 17 (level, low) -> IRQ 217 ACPI: Unable to turn cooling device [ffff810037fde290] 'on' -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #48 from trenn@novell.com 2007-06-11 01:53 MST -------
an invalid value for PBLK This one is harmless
The HPs are known to have ACPI issues: a) buggy ACPI implementation, that especially hits HPs hard (no fan watchdog, once unsynchronized, fans stop working, ...) b) EC interferes with other device drivers (e.g. like the psmouse issue, there were more strange EC breakage reported). Very hard to find out, probably EC firmware issue. This could be related to this one, in fact I'd be very happy if those EC failures there are because of some (noticable) device interference. You certainly hit b) as the temperature value (in comment #39) is totally bogus. Tristan: Some work is/was going on, best you try the latest kernel of the day which is 2.6.22-rcX based: ftp://ftp.suse.com/pub/projects/kernel/kotd/HEAD/x86_64/kernel-default.rpm The only change that could help here IMO is the psmouse cleanup, not sure whether you already got it, it's in 2.6.22-rcX for sure. about comment #47: This one looks scary: PCI quirk: region 4000-403f claimed by ali7101 ACPI -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #49 from info@tristanhoffmann.de 2007-06-11 06:07 MST ------- Thanks, I will try the new kernel with openSUSE Alpha 5 when it's released. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992 ------- Comment #50 from pavel@novell.com 2007-06-15 14:48 MST ------- Sorry for jumping late here. In comment #27, there's something about blacklist for sensors-detect. If sensors are important enough that we'd get hundreds angry mails per month if we disable them, what about doing it right? That means a whitelist, of systems where we know sensors work. Then we could think about loading them by default etc. Blacklist does not work, as more systems where ACPI conflicts with sensors are manufactured as years pass. Whitelist works, given big enough community. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=259992#c51
--- Comment #51 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c52
--- Comment #52 from Pavel Machek
https://bugzilla.novell.com/show_bug.cgi?id=259992#c53
--- Comment #53 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c54
--- Comment #54 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c55
--- Comment #55 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c56
--- Comment #56 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c57
--- Comment #57 from Pavel Machek
https://bugzilla.novell.com/show_bug.cgi?id=259992#c58
Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c59
--- Comment #59 from Jean Delvare
https://bugzilla.novell.com/show_bug.cgi?id=259992#c60
--- Comment #60 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c61
--- Comment #61 from Jean Delvare
https://bugzilla.novell.com/show_bug.cgi?id=259992#c62
--- Comment #62 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c63
--- Comment #63 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c64
--- Comment #64 from Jean Delvare
i would love to upgrade to 10.3 :-) but this bug https://bugzilla.novell.com/show_bug.cgi?id=332298 keeps me hoping , so i have to stay 10.2 for now :-)
10.3 would not do any better with regards to ACPI and hwmon drivers conflicting anyway. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=259992#c65
Pavel Machek
https://bugzilla.novell.com/show_bug.cgi?id=259992#c66
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=259992#c67
--- Comment #67 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c68
--- Comment #68 from Thomas Renninger
rmmod thermal tells me its in use This should work generally, maybe you still accessed /proc/acpi/thermal somewhere? fuser -m /proc/acpi/thermal should tell you.
To get rid of the thermal module you need to: remove thermal from ACPI_MODULES (or similar) variable in /etc/sysconfig/kernel and from /etc/sysconfig/kernel (INITRD_MODULES list), then invoke mkinitrd. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=259992#c69
--- Comment #69 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c70
--- Comment #70 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c71
--- Comment #71 from Thomas Renninger
remove thermal from ACPI_MODULES (or similar) variable in /etc/sysconfig/kernel Shoud be: remove thermal from ACPI_MODULES (or similar) variable in /etc/sysconfig/powersave/common You have to enter explicitly the modules you like to have not remove thermal: "processor button" ,possibly battery ac or more if you have a laptop.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=259992#c72
--- Comment #72 from Christoph Resch
https://bugzilla.novell.com/show_bug.cgi?id=259992#c73
--- Comment #73 from Thomas Renninger
participants (1)
-
bugzilla_noreply@novell.com