[Bug 408728] New: Multiple issues with openSUSE 11.0 on HP Pavilion ZE4200
https://bugzilla.novell.com/show_bug.cgi?id=408728 Summary: Multiple issues with openSUSE 11.0 on HP Pavilion ZE4200 Product: openSUSE 11.0 Version: Final Platform: PC OS/Version: openSUSE 11.0 Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: suse@randycushman.com QAContact: qa@suse.de Found By: Customer Created an attachment (id=227469) --> (https://bugzilla.novell.com/attachment.cgi?id=227469) dmesg output I am experiencing multiple issues with openSUSE 11.0 that I do not experience with openSUSE 10.3 that may be interrelated. 1) System powers off under any significant CPU load, apparently due to high temperature. 2) Display brightness cannot be adjusted using keyboard. 3) Display does not dim when AC power is disconnected. 4) System does not automatically switch between pointing devices when PS2 mouse is plugged/unplugged. System: HP Pavilion ZE4200 (Linux logs report ze4400) AMD Athlon Mobile 2400+ I have not been able to identify any difference between the dmesg output for 10.3 vs. 11.0 that appears to be significant. I see one additional issue that is intermittent. I suspect it is not related, but I'll mention it just in case: 5) System frequently does not respond to keyboard and mouse console input. (Able to log in using SSH.) The occurrence of 5) correlates with an additional log entry reported by dmesg: "pci 0000:00:02.0: OHCI: BIOS handoff failed (BIOS bug?) 00000197" The system is configured with 3 linux partitions for comparison: openSUSE 10.3, upgrade from openSUSE 10.3 to 11.0, and openSUSE 11.0 fresh install. I get the same results whether using kernel 2.6.25.9-0.2-pae or 2.6.25.5-1.1-pae. I have noticed no differences between the upgrade and fresh install partitions. Suggestions? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=408728
Cyril Hrubis
https://bugzilla.novell.com/show_bug.cgi?id=408728
User thoenig@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c1
Timo Hoenig
1) System powers off under any significant CPU load, apparently due to high temperature.
Can you check $ cat /proc/acpi/thermal/*/* to see whether the system really overheats.
2) Display brightness cannot be adjusted using keyboard.
Please open a new bug report, assign it to thoenig@novell.com
3) Display does not dim when AC power is disconnected.
Please open a new bug report, assign to hmacht@novell.com, add thoenig@novell.com and trenn@novell.com to CC.
4) System does not automatically switch between pointing devices when PS2 mouse is plugged/unplugged.
Please open a new bug report, assign it to bnc-team-screening@forge.provo.novell.com Thank you! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=408728
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c2
--- Comment #2 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c3
--- Comment #3 from Randy Cushman
for x in /proc/acpi/thermal_zone/*/polling_frequency; do echo 10 > $x done Does this help?
No, it didn't help. I also tried setting the polling_frequency to 1--still no difference. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c4
Randy Cushman
Can you check
$ cat /proc/acpi/thermal/*/*
to see whether the system really overheats.
I suspect the problem is temperature-related because the system runs much longer before failure if I hang the machine off the edge of a table so that the air intake is clear of the table. The temperature check is not conclusive because the last temperature I've seen before failure when checking every few seconds is 95C. For comparison: # cat trip_points critical (S5): 100 C passive: 97 C: tc1=4 tc2=3 tsp=40 devices=CPU0 As I ponder this issue further I have some new hypotheses: - Linux has never managed temperature properly on this machine, although in the past it took a kernel rebuild without using the hanging-machine-off-the-edge trick to trigger a thermal shutdown. - This lack of temperature management only now is becoming an issue due to the issue of the new kernel causing systems to run hotter than before, that I've seen reported elsewhere. I'll create new tickets for the remaining issues. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=408728
Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c5
--- Comment #5 from Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c6
--- Comment #6 from Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c7
--- Comment #7 from Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c8
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c9
Randy Cushman
There has been a bug introduced in a 2.6.25.X stable kernel which caused X60 ThinkPads to overheat. It is fixed now. This could be what you see here, it might not be pae related. I close this one for now. Please reopen if you see the problem with any latest kernel (also pae).
In a way I was hoping you were right. Having tickets open for 5 issues that all have the same workaround is monotonous. (The same goes for 3 additional issues for which I have not bothered to open tickets.) Alas, the confirming test fits the pattern. Here is the update: For the record, I am able to reproduce this issue using the following kernels: kernel-pae-2.6.25.5-1.1 kernel-pae-2.6.25.9-0.2 kernel-pae-2.6.25.11-0.1 I am unable to reproduce this issue using the following kernels: kernel-default-2.6.25.11-0.1 kernel-default-2.6.27-rc3.HEAD_20080819185209 Presumably if this issue were related to the X60 bug you mentioned, testing with kernel-pae-2.6.25.11-0.1 vs. kernel-default-2.6.25.11-0.1 would yield the same results, regardless of whether or not the patch had been applied as of 2.6.25.11-0.1. Therefore my pae theory remains supported by all available data. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=408728
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c10
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c11
Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c12
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=408728
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c13
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=408728
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c14
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c15
--- Comment #15 from Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c16
--- Comment #16 from Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c17
--- Comment #17 from Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c18
Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c19
Randy Cushman
Some suggestions:
In no particular order: (It finally dawned on me that when you said "any latest kernel" you didn't mean latest release kernel. Henceforth I will test with kotd kernels.) In bug 417845, comments 7-10 you had me try a couple of procedures related to ACPI events. In summary, when running the pae kernel, no ACPI-related interrupts are received whatsoever. With the pae kernel, reading ACPI information generally works (e.g. subsequent reads of /proc/ACPI/thermal_zone/THRM/temperature yield different values), but some information is incorrect or not updated (e.g. /proc/acpi/battery/BAT1/state is incorrectly reported as not present). Attaching results of test #2 and dmesg. Test procedure used: logger "start debug" echo 0x400181F > /sys/module/acpi/parameters/debug_level cat /proc/acpi/{fan,thermal_zone,ac_adapter,processor,battery}/*/* echo 0x7 > /sys/module/acpi/parameters/debug_level logger "end debug" (directory /proc/acpi/fan was empty for both kernels) I will try building kernels with different options as time permits. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c20
--- Comment #20 from Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c21
--- Comment #21 from Thomas Renninger
/disk/by-id/scsi-SATA_HTS721010G9AT00_MPC0B2Y0GKJPHE-part8 Oh, could be a bug in 11.1 in the bootloader scripts (yast or perl).
Could it be that your /boot/grub/menu.lst looks similar to: root (/disk/by-id/scsi-SATA_HTS721010G9AT00_MPC0B2Y0GKJPHE-part8) kernel /boot/vmlinuz-2.6.25.11-0.1-default root=/disk/by-id/scsi-SATA_HTS721010G9AT00_MPC0B2Y0GKJPHE-part8 initrd ... But the first root declaration must use the grub device: root (hd0,7) kernel /boot/vmlinuz-2.6.25.11-0.1-default root=/disk/by-id/scsi-SATA_HTS721010G9AT00_MPC0B2Y0GKJPHE-part8 initrd ... Be aware that grub partitions start with 0 and kernel disk devices with 1. Therefore: (hd0,7) matches the first disk and the 8th partition, what should be needed in your case (hdDISK,PARTITION). Is that?
Your installation instructions do not address changing config options Thanks for the hint, I'll add something. In your case (for trying out a specific config) I'd use the editor. Be aware that you should break the compilation the first time (or does make config work?). make detects changes to .config and will revalidate the configuration. If a dependency is wrong, e.g. you disabled an option which is needed for another, your changes will be reset.
cat /proc/cpuinfo |grep pae shows pae in the flags right? AFAIK otherwise the kernel should not boot at all... Some comments to your logs: - Temperature readings from EC seem to work - One System IO address is used (not much is happening at all, I expected more...) that looks valid, strange is that it is declared as LIDS variable in ASL, but this one is never used. - The battery status is "present" (0xF), do you really see "not present" for BAT1? Something is really fishy... Did I ask that already: Did a -pae kernel ever worked on that machine? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c22
--- Comment #22 from Randy Cushman
cat /proc/cpuinfo |grep pae shows pae in the flags right? AFAIK otherwise the kernel should not boot at all... yes
- The battery status is "present" (0xF), do you really see "not present" for BAT1? Something is really fishy... Double checked: # cat /proc/acpi/battery/BAT1/state present: no
Did I ask that already: Did a -pae kernel ever worked on that machine?
I have never had a -pae kernel function correctly on this machine. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c23
--- Comment #23 from Randy Cushman
In your case (for trying out a specific config) I'd use the editor. Be aware that you should break the compilation the first time (or does make config work?). make detects changes to .config and will revalidate the configuration. If a dependency is wrong, e.g. you disabled an option which is needed for another, your changes will be reset.
Perhaps I am not following your instructions here. When I replaced CONFIG_HIGHMEM4G with CONFIG_HIGHMEM64G using a text editor, make caused .config to revert back to the original setting. When I switched to CONFIG_HIGHMEM64G using make menuconfig, CONFIG_X86_PAE was selected, and could not be deselected.
Be aware that you should break the compilation the first time (or does make
Were you attempting to describe a method of preventing the dependency check from being performed? In case the steps I performed were consistent with your intention, here are the results. The following changes were made to .config, relative to defconfig-pae: CONFIG_LOCALVERSION="-default" (="-pae" for pae) CONFIG_NR_CPUS=32 (=128 for pae) # CONFIG_NET_9P_VIRTIO is not set (=m for pae) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=408728
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c24
--- Comment #24 from Thomas Renninger
Were you attempting to describe a method of preventing the dependency check from being performed? No. If there are dependencies they get solved automatically and you need to double check if you manually edited .config. make menuconfig is very convenient here as soon as you've found the exact CONFIG you like to modify.
To be honest I am not sure how to proceed. The increased ACPI debug level did expose a bit (the addresses used make sense, temperature readings and battery status, but I wonder why the battery is not further evaluated and set to "not present"), but not as much as I had hoped. Maybe you could do this again (only with your latest, already installed pae kernel) with ACPI debug level increased to max: Maybe something like: echo 0xFFFFFFFF >/sys/module/acpi/parameters/debug_level cat /proc/acpi/battery/*/* Can you also provide: /proc/interrupts and /proc/iomem Hmm, best also add hwinfo (even if it has duplicated info). Like that all necessary info should be stored here. I cannot invest that much time on this, but I'll try to poke and ask some more people whether they still have ideas. I wonder whether we might have a real HW problem here. PAE enables another memory access mode which is partly HW driven. If this is the case we will search forever. PAE works for a lot machines, also a huge amount of different laptop models work just well with it... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c25
--- Comment #25 from Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c26
--- Comment #26 from Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User suse@randycushman.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c27
--- Comment #27 from Randy Cushman
https://bugzilla.novell.com/show_bug.cgi?id=408728
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=408728#c29
Thomas Renninger
participants (1)
-
bugzilla_noreply@novell.com