Christian Hopfensitz wrote:
I have a strange problem with my smp-system. Hardware
s2895 mobo, 2x Opteron 270 Dual-Core CPU, 16GB
ECC Reg. RAM
We use this Workstation for CFD-Analyses. When I start
4 Jobs on 4
CPUs, at least 2 jobs die (segmentation fault). One job works fine. A
parallel run using mpich as MPI aborts (Error message says that it has
recieved a SIGABORT signal form a CPU / SIGSEGV). The strage thing: If I
remove 8GB of RAM, the system works stable. Neither an update from Suse
9 to Suse 10 nor a manual compiled kernel with the current stable
version helped. A memtest86+ showed no errors (running 5 days), so I
think that a bad RAM module is not hte reason. Furthermore, a cpu-stress
test with cpuburn-in worked well (24 hours running without any errors).
Might be a problem with the memory managment?
Any hints how I can resolve my problem?
You haven't had much response, so here's my 2 pennyworth ...
(1) It could be a hardware problem. Tyan can be very helpful. You could
contact them directly, or you could contact your hardware vendor and ask
them to contact Tyan if they can't help you themselves.
(2) It could be a software problem. You're running a specific
application? Try asking on a list related to that application, or on an