On Monday 20 February 2006 19:00, Francisco Jesús Martínez Serrano wrote:
I'm no expert, but I guess that means it assigns all memory, from byte 0 till 2^64 to each node. Not surprisingly acpi_numa_memory_affinity_init in srat.c finds it to overlap and proceeds to ignore the information.
The BIOS is broken. Complain to Iwill. That said the 10.1 kernel will probably handle the fallback better and might discover the correct node assignment even without a working SRAT on this machine.
The question is: is this the problem (we also experience random crashes when running a 2GB+ simulation)
Most likely some RAM hardware problem. Double check DIMMs (e.g. by taking some out and retesting), double check cooling, double check BIOS event log, make sure you only use Iwill approved DIMM types, run memtest86 for a long time, complain to your hardware vendor etc. On Opteron it's also sometimes not the DIMMs but the VRM modules of the CPUs that make trouble with a lot of RAM. A quick test is also to run ftp://ftp.suse.com/pub/people/ak/tools/memeat.c overnight. -Andi