On Monday 20 February 2006 19:00, Francisco Jesús Martínez Serrano wrote:
I'm no expert, but I guess that means it assigns
all memory, from byte 0
till 2^64 to each node. Not surprisingly acpi_numa_memory_affinity_init in
srat.c finds it to overlap and proceeds to ignore the information.
The BIOS is broken. Complain to Iwill.
That said the 10.1 kernel will probably handle the fallback better and might
discover the correct node assignment even without a working SRAT on this machine.
The question is: is this the problem (we also experience random crashes when
running a 2GB+ simulation)
Most likely some RAM hardware problem. Double check DIMMs (e.g. by taking some out
and retesting), double check cooling, double check BIOS event log,
make sure you only use Iwill approved DIMM types, run memtest86 for a long time,
complain to your hardware vendor etc. On Opteron it's also sometimes not
the DIMMs but the VRM modules of the CPUs that make trouble with a lot of RAM.
A quick test is also to run ftp://ftp.suse.com/pub/people/ak/tools/memeat.c