-----Original Message----- From: Greg KH <gregkh@suse.de> To: Кузьминский Михаил <mikky_m@mail.ru> Date: Sun, 16 Aug 2009 09:50:05 -0700 Subject: Re: [opensuse-kernel] Nehalem kernel NUMA bug
On Sun, Aug 16, 2009 at 08:39:59PM +0400, Кузьминский Михаил wrote:
I've dual E5520 server w/Supermicro X8DTI mobo (last June 2009 1.0c bios version). Default OpenSuSE 11.1 x86-64 kernel (2.6.27.7-9) gives an error in /sys/devices/system/node directory - there is node0 and node2 subdirectories instead of node0 and node1 (smt is turned off). As a result numactl tools don't work. I don't find the message about this error in SuSE bugzilla "database".
This Nehalem/NUMA error is also known (AFAIK) for 2.6.29-6 default OpenSuSE kernel, and also in FC11 kernel 2.6.29, and in some CentOS 2.6 kernel; but there are kernels w/o this error :-)
What are the kernels without this error?
AFAIK Centos 5.3 w/2.6.18-128.2.1 is OK. 2.6.18-128.1.10 don't work. But I didn't check it myself ! May be default OpenSuSE 10.3 kernel also is OK (see opensuse--hardware--64 bit forum on suse site, thread w/Nehalem content in "thread name"). May be extraction from dmesg on my SuSE11.1/2.6.27.7-9 will be helpful: it show, that it was NODE1, but then node2 is appear (see below) ! (see also http://marc.info/?l=linux-netdev&m=124967917523109&w=2) ACPI: SRAT BF79A4B0, 0150 (r1 041409 OEMSRAT 1 INTL 1) ACPI: SSDT BF79FAC0, 249F (r1 DpgPmm CpuPm 12 INTL 20051117) ACPI: Local APIC address 0xfee00000 SRAT: PXM 0 -> APIC 0 -> Node 0 SRAT: PXM 0 -> APIC 2 -> Node 0 SRAT: PXM 0 -> APIC 4 -> Node 0 SRAT: PXM 0 -> APIC 6 -> Node 0 SRAT: PXM 1 -> APIC 16 -> Node 1 SRAT: PXM 1 -> APIC 18 -> Node 1 SRAT: PXM 1 -> APIC 20 -> Node 1 SRAT: PXM 1 -> APIC 22 -> Node 1 SRAT: Node 0 PXM 0 0-a0000 SRAT: Node 0 PXM 0 100000-c0000000 SRAT: Node 0 PXM 0 100000000-1c0000000 SRAT: Node 2 PXM 257 1c0000000-340000000 (here !!) NUMA: Allocated memnodemap from 1c000 - 22880 NUMA: Using 20 for the hash shift. Bootmem setup node 0 0000000000000000-00000001c0000000 NODE_DATA [0000000000022880 - 000000000003a87f] bootmap [000000000003b000 - 0000000000072fff] pages 38 (8 early reservations) ==> bootmem [0000000000 - 01c0000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] #2 [0000200000 - 0000bf27b8] TEXT DATA BSS ==> [0000200000 - 0000bf27b8] #3 [0037a3b000 - 0037fef104] RAMDISK ==> [0037a3b000 - 0037fef104] #4 [000009cc00 - 0000100000] BIOS reserved ==> [000009cc00 - 0000100000] #5 [0000010000 - 0000013000] PGTABLE ==> [0000010000 - 0000013000] #6 [0000013000 - 000001c000] PGTABLE ==> [0000013000 - 000001c000] #7 [000001c000 - 0000022880] MEMNODEMAP ==> [000001c000 - 0000022880] Bootmem setup node 2 00000001c0000000-0000000340000000 NODE_DATA [00000001c0000000 - 00000001c0017fff] bootmap [00000001c0018000 - 00000001c0047fff] pages 30 (8 early reservations) ==> bootmem [01c0000000 - 0340000000] #0 [0000000000 - 0000001000] BIOS data page #1 [0000006000 - 0000008000] TRAMPOLINE #2 [0000200000 - 0000bf27b8] TEXT DATA BSS #3 [0037a3b000 - 0037fef104] RAMDISK #4 [000009cc00 - 0000100000] BIOS reserved #5 [0000010000 - 0000013000] PGTABLE #6 [0000013000 - 000001c000] PGTABLE #7 [000001c000 - 0000022880] MEMNODEMAP found SMP MP-table at [ffff8800000ff780] 000ff780 [ffffe20000000000-ffffe20006ffffff] PMD -> [ffff880028200000-ffff88002e1fffff] on node 0 [ffffe20007000000-ffffe2000cffffff] PMD -> [ffff8801c0200000-ffff8801c61fffff] on node 2
Is there some updated OpenSuSE 11.1 kernels w/o this error ? Or may be some OpenSuSE 11.2 kernels ?
Can you try the latest 11.2 kernel to see if that still has this issue?
I'm extremally busy until Sept. 15th, so only if it'll "cost" me only some rpm commands issuing for update (or add second kernel for GRUB - may be it's better) and then to downgrade back. Is it enough to install only //download.opensuse.org/factory/repo/oss/suse/x86-64/kernel-default-2.6.31-2.1.x86_64.rpm or I need also kernel-syms package etc (hope that not glibc) ? I need xfs support in kernel for /home. Yours Mikhail
That would make it much easier to help track down.
greg k-h
-- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org