22 Jun
2004
22 Jun
'04
12:12
We have been experiencing some intermittent but quite critical problems on a cluster of 10 dual 244 with 1GB memory boxes running Suse prof 9.0. The problem occurs on all 10 nodes, ranges between 4 - 10 days. When the system hangs, no response to ping, or any input at the console. The only way out is to hit the reset button. We have checked the cpu temperature but that have settled down to between 25 - 48 C across different nodes. We have also transferred our application to run on RAM drives and that has helped to keep the system to 16 - 22 days. Does anyone has any idea on how to analyse this problem? Thanks. Peter
7517
Age (days ago)
7517
Last active (days ago)
0 comments
1 participants
participants (1)
-
Chiu, PCM (Peter)