Suse 9.1 on my A64 laptop idles at huge load (8 load average)
Greetings,
This may be a known problem but I can't find it anywhere...
I have an Athlon64 loptop with a 3400+ in it (C0 stepping, I think).
I just installed Suse 9.1 x86-64 after having run 9.0 x86-64 successfully
for quite a while (after I hacked the Powernow driver to work, that is,
sigh).
After it booted up the first time, I noticed it seemed relatively
sluggish. Turns out at some point in the boot sequence the load
jumps up to about 8 and stays there at a minimum.
The power management from default/install kernel from the distribution
sees this as a large load and leaves the processor at high speed all
the time.
I tried updating the kernel to the newest one off the Suse FTP site,
and the result is the same except that now Powernow support is broken
again (I have a buggy BIOS, but this problem has been around for a
while...).
I can fix the Powernow issue and post a patch to this list easily enough,
but I still need a fix for the huge idle load issue. Without it, Suse
9.1 is effectively nonfunctional for me.
If I knew what was sucking up the CPU, then I could patch/hack it
myself, but while "top" reports the CPU is 100% busy with 8-12 running
Tasks, the list of processes in general seem idle, with perhaps a few
kernel processes that seem slightly busy, but none are listed as using
more than about 2.5% of the CPU. The number of interrupts are also
relatively normal.
Given what I'm seeing below, it kinda seems like ACPI (or some other
event generator) is going crazy and reporting events constantly, and
that's how the machine is being taken to it's knees. But with little
Linux ACPI experience I'm not sure where the hooks are to look into this.
If I don't hear anything from the list I'll probably fire up oprofile
and do some code sniffing from there to see where it's actually spending
it's time.
FYI, Windows 64 (and -32) apparently work fine on this machine.
Example "top" output:
--------------------------------------------------------------------
Tasks: 75 total, 11 running, 62 sleeping, 0 stopped, 2 zombie
Cpu(s): 67.9% us, 32.1% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 1028688k total, 309112k used, 719576k free, 23472k buffers
Swap: 1048816k total, 0k used, 1048816k free, 117892k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12466 root 16 0 166m 26m 144m S 1.9 2.6 0:03.77 X
3235 root 16 0 12232 1456 11m R 1.3 0.1 0:29.06 powersaved
4 root 5 -10 0 0 0 S 1.0 0.0 0:26.65 kacpid
12094 eboleyn 16 0 3988 1124 3752 R 0.6 0.1 0:00.07 top
17261 root 5 -20 7868 1532 7512 S 0.6 0.1 0:00.02 powersave_proxy
17440 root 2 -20 0 0 0 Z 0.6 0.0 0:00.02 powersa <defunct>
17443 root 4 -20 0 0 0 Z 0.6 0.0 0:00.02 powersa <defunct>
1 root 16 0 640 264 492 S 0.0 0.0 0:06.11 init
2 root 34 19 0 0 0 R 0.0 0.0 0:00.02 ksoftirqd/0
3 root 5 -10 0 0 0 S 0.0 0.0 0:00.26 events/0
5 root 5 -10 0 0 0 S 0.0 0.0 0:00.02 kblockd/0
6 root 5 -10 0 0 0 S 0.0 0.0 0:00.00 khelper
7 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
8 root 15 0 0 0 0 S 0.0 0.0 0:00.01 pdflush
10 root 7 -10 0 0 0 S 0.0 0.0 0:00.00 aio/0
--------------------------------------------------------------------
Current output of "/proc/cpuinfo" (Powernow is broken so it always says
800MHz, but the rest is normal):
--------------------------------------------------------------------
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 4
model name : AMD Athlon(tm) 64 Processor 3400+
stepping : 8
cpu MHz : 800.042
cache size : 1024 KB
...
--------------------------------------------------------------------
Relevant "dmesg" output:
--------------------------------------------------------------------
Bootdata ok (command line is root=/dev/hda1 vga=0x317 splash=silent desktop mem=1048512K)
Linux version 2.6.5-7.95-default (geeko@buildhost) (gcc version 3.3.3 (SuSE Linux)) #1 Thu Jul 1 15:23:45 UTC 2004
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
BIOS-e820: 000000003fff0000 - 000000003fffffc0 (ACPI data)
BIOS-e820: 000000003fffffc0 - 0000000040000000 (ACPI NVS)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
...
ACPI: RSDP (v000 OID_00 ) @ 0x00000000000e5010
ACPI: RSDT (v001 INSYDE RSDT_000 0x00000001 _CSI 0x00010101) @ 0x000000003fffcc40
ACPI: FADT (v001 INSYDE FACP_000 0x00000100 _CSI 0x00010101) @ 0x000000003ffffaa0
ACPI: BOOT (v001 INSYDE SYS_BOOT 0x00000100 _CSI 0x00010101) @ 0x000000003ffffb90
ACPI: DBGP (v001 INSYDE DBGP_000 0x00000100 _CSI 0x00010101) @ 0x000000003ffffbc0
ACPI: MADT (v001 INSYDE APIC_000 0x30303030 0000 0x30303030) @ 0x000000003ffffb30
ACPI: DSDT (v001 INSYDE K8T400 0x00001000 MSFT 0x0100000e) @ 0x0000000000000000
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:4 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: IOAPIC (id[0x01] address[0xfec00000] global_irq_base[0x0])
IOAPIC[0]: Assigned apic_id 1
IOAPIC[0]: apic_id 1, version 3, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Using ACPI (MADT) for SMP configuration information
...
ACPI: Subsystem revision 20040326
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: Embedded Controller [EC0] (gpe 5)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 10 *11 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 *10 11 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 7 10 11 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 7 10 *11 14 15)
...
ACPI: (supports S0 S3 S4 S5)
...
ACPI: AC Adapter [AC] (on-line)
ACPI: Battery Slot [BAT0] (battery present)
ACPI: Power Button (FF) [PWRF]
ACPI: Lid Switch [LID]
ACPI: Sleep Button (CM) [SBTN]
ACPI: Processor [CPU0] (supports C1 C2 C3)
ACPI: Thermal Zone [TZ0] (44 C)
powernow-k8: Found 1 AMD Athlon 64 / Opteron processors (version 1.00.09d)
powernow-k8: BIOS error: numpst must be 1
--------------------------------------------------------------------
--
Erich Stefan Boleyn
On Wed, Jul 07, 2004 at 12:47:57PM -0700, Erich Boleyn wrote: Hallo Erich,
This may be a known problem but I can't find it anywhere...
Not known to me.
I can fix the Powernow issue and post a patch to this list easily enough, but I still need a fix for the huge idle load issue. Without it, Suse 9.1 is effectively nonfunctional for me.
The latest update kernel will probably fix it (except if the BIOS misses a good PST or ACPI table, if yes then you'll need keep hardcoding a suitable table in the driver). But if Windows does C'n'Q it should work on Linux too.
If I don't hear anything from the list I'll probably fire up oprofile and do some code sniffing from there to see where it's actually spending it's time.
That's a good approach, yes. Also load average is also affected by 'D' processes, if you have such around it could also cause it.
Example "top" output: -------------------------------------------------------------------- Tasks: 75 total, 11 running, 62 sleeping, 0 stopped, 2 zombie Cpu(s): 67.9% us, 32.1% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 1028688k total, 309112k used, 719576k free, 23472k buffers Swap: 1048816k total, 0k used, 1048816k free, 117892k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
[...] Ok, no D processes assuming the listing was complete, although:
4 root 5 -10 0 0 0 S 1.0 0.0 0:26.65 kacpid
This process also shouldn't use CPU time in normal operation. acpi=off is definitely worth a try. -Andi
Andi Kleen
I can fix the Powernow issue and post a patch to this list easily enough, but I still need a fix for the huge idle load issue. Without it, Suse 9.1 is effectively nonfunctional for me.
The latest update kernel will probably fix it (except if the BIOS misses a good PST or ACPI table, if yes then you'll need keep hardcoding a suitable table in the driver). But if Windows does C'n'Q it should work on Linux too.
The BIOS on this laptop doesn't have a correct-to-spec PST table, hence
the problem. The current Linux driver also ignores the ACPI table, at
least as compiled for the distribution.
I have a pretty clean fix to this which works correctly for me, but I
had to enable all the P-states even if the battery setting says not to.
The driver just has a comment saying to "use ACPI" if this is a problem.
I guess I'd like to discuss with the maintainer of the driver first
before posting it.
--
Erich Stefan Boleyn
On Thu, 08 Jul 2004 15:37:19 -0700
Erich Boleyn
Andi Kleen
wrote: ...[Description had been deleted, but essentially my problem was that when running on my Athlon64 laptop, saw huge load (8-ish load average) even when idle]...
OK, I've mostly tracked it down.
After trying a lot of things, it appears to be either a bug with the "/usr/sbin/powersave_proxy" script, or in the action the script is trying to take. If I don't execute the script provided with the distribution, everything works normally (except of course the expected action to be caused by the powersave event doesn't take place).
Can you perhaps send strace -f output or better sh -x output of what it is doing? I guess it goes away when you comment out processor.performance=/usr/sbin/powersave_proxy processor.powersave=/usr/sbin/powersave_proxy processor.dynamic=/usr/sbin/powersave_proxy processor.dynamic.high=/usr/sbin/powersave_proxy processor.dynamic.low=/usr/sbin/powersave_proxy in /etc/powersave.conf
So my current workaround is to disable the "powersaved" daemon and manually control the CPU speed.
I can fix the Powernow issue and post a patch to this list easily enough, but I still need a fix for the huge idle load issue. Without it, Suse 9.1 is effectively nonfunctional for me.
The latest update kernel will probably fix it (except if the BIOS misses a good PST or ACPI table, if yes then you'll need keep hardcoding a suitable table in the driver). But if Windows does C'n'Q it should work on Linux too.
The BIOS on this laptop doesn't have a correct-to-spec PST table, hence the problem. The current Linux driver also ignores the ACPI table, at least as compiled for the distribution.
Powernow with ACPI should work on the distribution. It does on other machines. Did you really try the version in the update kernel? The powernow driver in the kernel on the CD was a bit outdated already. -Andi
Andi Kleen
The latest update kernel will probably fix it (except if the BIOS misses a good PST or ACPI table, if yes then you'll need keep hardcoding a suitable table in the driver). But if Windows does C'n'Q it should work on Linux too.
The BIOS on this laptop doesn't have a correct-to-spec PST table, hence the problem. The current Linux driver also ignores the ACPI table, at least as compiled for the distribution.
Powernow with ACPI should work on the distribution. It does on other machines.
Did you really try the version in the update kernel? The powernow driver in the kernel on the CD was a bit outdated already.
OK, I looked into this a bit further, and source of the Powernow problem
appears to be that the C preprocessor define "CONFIG_ACPI_PROCESSOR" is
not defined in the compile of the driver for the updated kernel I am
using, which is version "2.6.5-7.95".
The ".config" file generated by "make oldconfig" as well as the
default config file for the uniprocessor kernel has
"CONFIG_ACPI_PROCESSOR=m" in it, but I am not sure what exactly
that should translate to in the 2.6 kernel build framework, this
being my first time playing with it.
So, it was never following the ACPI path at all, hence why I needed
to patch the code reading the broken BIOS PST table on my A64
machine.
When looking through the build tree, there seems to be 2 usages of
this...
1) As an #ifdef directive in "powernow-k7.c" and "powernow-k8.c"
2) As a Makefile build directive in "./drivers/acpi/Makefile" clearly
for the ACPI "processor" module.
FYI, this will definitely break on several other laptops out there,
even with correct BIOS PST tables, as that section will refuse to
allow higher power states if the battery doesn't support them.
Anyway, my current workaround is now just to add a
"#define CONFIG_ACPI_PROCESSOR" to the top of "powernow-k8.c".
--
Erich Stefan Boleyn
On Fri, 09 Jul 2004 13:06:08 -0700
Erich Boleyn
Andi Kleen
wrote: The latest update kernel will probably fix it (except if the BIOS misses a good PST or ACPI table, if yes then you'll need keep hardcoding a suitable table in the driver). But if Windows does C'n'Q it should work on Linux too.
The BIOS on this laptop doesn't have a correct-to-spec PST table, hence the problem. The current Linux driver also ignores the ACPI table, at least as compiled for the distribution.
Powernow with ACPI should work on the distribution. It does on other machines.
Did you really try the version in the update kernel? The powernow driver in the kernel on the CD was a bit outdated already.
OK, I looked into this a bit further, and source of the Powernow problem appears to be that the C preprocessor define "CONFIG_ACPI_PROCESSOR" is not defined in the compile of the driver for the updated kernel I am using, which is version "2.6.5-7.95".
Hmm, I fixed this at one point. Must have regressed again.
That's the proper fix I think. Does it work for you?
-Andi
diff -u linux/arch/i386/kernel/cpu/cpufreq/powernow-k8.c-o linux/arch/i386/kernel/cpu/cpufreq/powernow-k8.c
--- linux/arch/i386/kernel/cpu/cpufreq/powernow-k8.c-o 2004-06-18 12:30:26.000000000 +0200
+++ linux/arch/i386/kernel/cpu/cpufreq/powernow-k8.c 2004-07-09 22:49:01.000000000 +0200
@@ -32,7 +32,7 @@
#include
Andi Kleen
OK, I looked into this a bit further, and source of the Powernow problem appears to be that the C preprocessor define "CONFIG_ACPI_PROCESSOR" is not defined in the compile of the driver for the updated kernel I am using, which is version "2.6.5-7.95".
Hmm, I fixed this at one point. Must have regressed again.
That's the proper fix I think. Does it work for you?
Your fix was correct but partial. It also needed a similar patch to the
header file. I've included an updated patch for both files, tested against
2.6.5-7.95.
----------------------------(start diff)------------------------------
diff -u linux-2.6.5-7.95.orig/arch/i386/kernel/cpu/cpufreq/powernow-k8.c linux-2.6.5-7.95/arch/i386/kernel/cpu/cpufreq/powernow-k8.c
--- linux-2.6.5-7.95.orig/arch/i386/kernel/cpu/cpufreq/powernow-k8.c 2004-07-01 08:53:30.000000000 -0700
+++ linux-2.6.5-7.95/arch/i386/kernel/cpu/cpufreq/powernow-k8.c 2004-07-11 13:33:54.000000000 -0700
@@ -32,7 +32,7 @@
#include
On Thu, 08 Jul 2004 15:37:19 -0700
Erich Boleyn
So my current workaround is to disable the "powersaved" daemon and manually control the CPU speed.
According to the laptop people at suse it is a BIOS issue on that Mitac model. It constantly reports some ACPI event when the CPU runs the powersaved cannot deal with. Replacing # This concerns unknown ACPI events which # are by default ignored. other=/usr/sbin/powersave_proxy %a with other=ignore may help, better would be to fix the BIOS to not send such bogus events all the time. -Andi
Andi Kleen
Erich Boleyn
wrote: So my current workaround is to disable the "powersaved" daemon and manually control the CPU speed.
According to the laptop people at suse it is a BIOS issue on that Mitac model. It constantly reports some ACPI event when the CPU runs the powersaved cannot deal with.
Replacing
# This concerns unknown ACPI events which # are by default ignored. other=/usr/sbin/powersave_proxy %a
with
other=ignore
may help,
Ahhh. That seems to have taken care of it. I had been working my way down the list trying that, but it helps to have a pointer. :-) As to it being a "constant reporting" issue, from the testing I had done, I'm pretty sure it only does that when stimulated by the "powersave_proxy" script as provided in the distribution. When I replaced it with a script that only logged the event commands, it only reports a few events, and only the ones you'd expect. I.e. I think it's either a real bug in the "powersave_proxy" script, or a situation where there is a cascading series of events, each one as serviced by "powersave_proxy" then triggers another event (or set of events), etc. In either case I'd bet there is something about the "powersave_proxy" script that is causing the problem, which Windows doesn't do.
... better would be to fix the BIOS to not send such bogus events all the time.
Well, possibly, but I don't have BIOS source code. :-/
But given my above observations, I'm not sure it's the fault of
the BIOS in any case, other than having a behavior not well predicted
by the writers of "powersave_proxy".
--
Erich Stefan Boleyn
participants (2)
-
Andi Kleen
-
Erich Boleyn