I have a problem with a dual opteron machine randomly rebooting depending on what memory is in the machine, does anyone know how to track this problem down. The memory seems to be ok, if there's just 8Gb in the machine then it's nice and stable but go upto 12 or 16 and it starts rebooting. The memory has all been in the machine as the first 8 at some point or other and I've tried different memory so I don't think it's that. Plus it's only when running 2 copies of abaqus simualtaneously that the problem occurs, running 1 will go indefinitely (> 48 hours) but running 2 has so far been unable to run for longer than 24 hrs without a reboot when there's > 8Gb of memory in the machine. This test is running a set of 14 jobs over and over again (takes < 2hrs to complete a set), so it can run the jobs but something upsets it somewhere I'm using SLES 9 and have tried the supplied kernel as well as getting the latest source and compiling that as a kernel (2.6.9), both seem to do the same, rebooting as If someone has pressed the reset button so I can't find anything in /var/log Any ideas of how to find out what is happening ? Thanks Paul
On Wed, 24 Nov 2004, Paul Brown wrote:
The memory seems to be ok, if there's just 8Gb in the machine then it's nice and stable but go upto 12 or 16 and it starts rebooting. The memory has all been in the machine as the first 8 at some point or other and I've tried different memory so I don't think it's that.
Is the power supply up to its task? When using more memory (and more processes) load on the power supply increases. Regards, Jac --- Jac Kersing Technical Consultant The-Box Development j.kersing@the-box.com http://www.the-box.com
participants (2)
-
Jac Kersing
-
Paul Brown