I was checking the performance of numademo stream test on a system I have stumbled upon (Iwill H8501 8way 8-GB in 16 512MB dimms). The results were pretty low (around 1600MB/s on all tests), so I proceeded to check weather node interleaving was disabled in the BIOS. Everything was set up with factory defaults (node interleave OFF, bank interleaving ON). Checking the kernel log (SuSE w kernel 2.6.13-15.8-smp x86_64), I found this: SRAT: Node 0 PXM 0 0-ffffffffffffffff SRAT: Node 1 PXM 1 0-ffffffffffffffff SRAT: Node 2 PXM 2 0-ffffffffffffffff SRAT: Node 3 PXM 3 0-ffffffffffffffff SRAT: Node 4 PXM 4 0-ffffffffffffffff SRAT: Node 5 PXM 5 0-ffffffffffffffff SRAT: Node 6 PXM 6 0-ffffffffffffffff SRAT: Node 7 PXM 7 0-ffffffffffffffff SRAT: pxm 0 overlap 0-9fc00 with node 1(0-ffffffffffffffff) SRAT: SRAT not used. I'm no expert, but I guess that means it assigns all memory, from byte 0 till 2^64 to each node. Not surprisingly acpi_numa_memory_affinity_init in srat.c finds it to overlap and proceeds to ignore the information. The question is: is this the problem (we also experience random crashes when running a 2GB+ simulation) causing low memory performance? How can I solve it? I can provide logs and do tests as requested. Thanks!
On Monday 20 February 2006 19:00, Francisco Jesús Martínez Serrano wrote:
I'm no expert, but I guess that means it assigns all memory, from byte 0 till 2^64 to each node. Not surprisingly acpi_numa_memory_affinity_init in srat.c finds it to overlap and proceeds to ignore the information.
The BIOS is broken. Complain to Iwill. That said the 10.1 kernel will probably handle the fallback better and might discover the correct node assignment even without a working SRAT on this machine.
The question is: is this the problem (we also experience random crashes when running a 2GB+ simulation)
Most likely some RAM hardware problem. Double check DIMMs (e.g. by taking some out and retesting), double check cooling, double check BIOS event log, make sure you only use Iwill approved DIMM types, run memtest86 for a long time, complain to your hardware vendor etc. On Opteron it's also sometimes not the DIMMs but the VRM modules of the CPUs that make trouble with a lot of RAM. A quick test is also to run ftp://ftp.suse.com/pub/people/ak/tools/memeat.c overnight. -Andi
participants (2)
-
Andi Kleen
-
Francisco Jesús Martínez Serrano