My dual opteron 242 K8W thunder with 4Gb of DDR400 Reg ECC has been running Suse 9.0 for a couple of weeks now. Everything seems to be working fine and memtest showed no problems. Unfortunately, every few days or so (it has happened 3 times now), the system just freezes up, and I mean completely. No logins, no escapes, no keyboard or mouse response at all. It still responds to pings though, which is strange since no other remote operations are successful. There are no messages in the logs (at least the ones I've checked), so I assume its probably hardware related. Any advice/checks I can try? Its not debilitating since, but it sure is worrisome. Eugene _________________________________________________________________ G-string or thermal underwear? Find out at MSN Weather! http://www.msn.co.za/weather/
"Eugene de Villiers"
My dual opteron 242 K8W thunder with 4Gb of DDR400 Reg ECC has been running Suse 9.0 for a couple of weeks now. Everything seems to be working fine and memtest showed no problems.
Unfortunately, every few days or so (it has happened 3 times now), the system just freezes up, and I mean completely. No logins, no escapes, no keyboard or mouse response at all. It still responds to pings though, which is strange since no other remote operations are successful.
There are no messages in the logs (at least the ones I've checked), so I assume its probably hardware related. Any advice/checks I can try?
Just the obvious thing: Do you run the latest BIOS? Andreas -- Andreas Jaeger, aj@suse.de, http://www.suse.de/~aj SuSE Linux AG, Maxfeldstr. 5, 90409 Nürnberg, Germany GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
Hello all. I had to pull a failed drive in an array. Unfortunately there is no software to check that while the machine is up (LSI MegaRAID 320-4x). I had to restart the machine to confirm what I suspected. After changing out the drive I attempted to restart the machine. Rather than booting normally, it comes up toe the SUSE kernel selection menu, unzips the kernel, outputs "BIOS check Successful" and locks-up hard. So I tried booting with iommu=force. Then I hooked up a console so I could see what was happening and used "text console=ttyS0,115200 console=tty0" and still no output. Nothing, nada. I am about to try apic=off. This is a Tyan 2880 with 4GB of RAM (I pulled the other 2GB after it wasn't booting as it has had issues with this in the past). Ideas? -- Santiago Flores Sr. Systems Administrator Iqdirection.com santiago@iqdirection.com 480-560-3151
On Mon, 23 Feb 2004 19:12:54 +0000
"Eugene de Villiers"
My dual opteron 242 K8W thunder with 4Gb of DDR400 Reg ECC has been running Suse 9.0 for a couple of weeks now. Everything seems to be working fine and memtest showed no problems.
Unfortunately, every few days or so (it has happened 3 times now), the system just freezes up, and I mean completely. No logins, no escapes, no keyboard or mouse response at all. It still responds to pings though, which is strange since no other remote operations are successful.
There are no messages in the logs (at least the ones I've checked), so I assume its probably hardware related. Any advice/checks I can try?
Its not debilitating since, but it sure is worrisome.
Do you use any 32bit programs? If yes you could be a victim of errata #94. Try a BIOS update, very old BIOS didn't have the correct workaround for that. -Andi
Unfortunately I don't have a solution, but I was wondering about your DDR400 with a 242 CPU. My understanding is that this would run at DDR333. Do you have any idea if you are getting DDR400 speed? I'm not even sure how to determine this. I have used hdparm -T for some sort of rough estimate. I'd be curious what your results are. Here's mine with a dual 240 DDR333 system: hdparm -T /dev/md0 /dev/md0: Timing buffer-cache reads: 2680 MB in 2.00 seconds = 1339.33 MB/sec This result is fairly average for my system... sometimes its lower sometimes its higher. Mark Eugene de Villiers wrote:
My dual opteron 242 K8W thunder with 4Gb of DDR400 Reg ECC has been running Suse 9.0 for a couple of weeks now. Everything seems to be working fine and memtest showed no problems.
Unfortunately, every few days or so (it has happened 3 times now), the system just freezes up, and I mean completely. No logins, no escapes, no keyboard or mouse response at all. It still responds to pings though, which is strange since no other remote operations are successful.
There are no messages in the logs (at least the ones I've checked), so I assume its probably hardware related. Any advice/checks I can try?
Its not debilitating since, but it sure is worrisome.
Eugene
_________________________________________________________________ G-string or thermal underwear? Find out at MSN Weather! http://www.msn.co.za/weather/
HI! What kind of disks do you have? I hace SATA from Hitachi and I only get this: /dev/md0: Timing buffer-cache reads: 1940 MB in 2.00 seconds = 969.03 MB/sec Timing buffered disk reads: 172 MB in 3.02 seconds = 56.93 MB/sec MIlan Mark Horton wrote:
Unfortunately I don't have a solution, but I was wondering about your DDR400 with a 242 CPU. My understanding is that this would run at DDR333. Do you have any idea if you are getting DDR400 speed? I'm not even sure how to determine this. I have used hdparm -T for some sort of rough estimate. I'd be curious what your results are.
Here's mine with a dual 240 DDR333 system:
hdparm -T /dev/md0
/dev/md0: Timing buffer-cache reads: 2680 MB in 2.00 seconds = 1339.33 MB/sec
This result is fairly average for my system... sometimes its lower sometimes its higher.
Well I have a mix - 2 sata, 10 scsi, and 1 ide. I get pretty much the same result with /dev/md0, dev/sda/, or /dev/hda. My understanding is that the -T option is a rough estimate of memory bandwidth. I dont believe -T touches the disks. I think the memory bandwidth would be most effected by memory setup (DDR speed, dual channel, latency, etc...) and the hyper-transport link speed. In my bois I have the HT link speed at its highest setting. I also have ECC disbaled in the bois. btw here's the output with disk reads: hdparm -Tt /dev/md0 /dev/md0: Timing buffer-cache reads: 2716 MB in 2.00 seconds = 1358.00 MB/sec Timing buffered disk reads: 900 MB in 3.00 seconds = 299.60 MB/sec Mark Milan Gabor wrote:
HI!
What kind of disks do you have? I hace SATA from Hitachi and I only get this:
/dev/md0: Timing buffer-cache reads: 1940 MB in 2.00 seconds = 969.03 MB/sec Timing buffered disk reads: 172 MB in 3.02 seconds = 56.93 MB/sec
MIlan
Mark Horton wrote:
Unfortunately I don't have a solution, but I was wondering about your DDR400 with a 242 CPU. My understanding is that this would run at DDR333. Do you have any idea if you are getting DDR400 speed? I'm not even sure how to determine this. I have used hdparm -T for some sort of rough estimate. I'd be curious what your results are.
Here's mine with a dual 240 DDR333 system:
hdparm -T /dev/md0
/dev/md0: Timing buffer-cache reads: 2680 MB in 2.00 seconds = 1339.33 MB/sec
This result is fairly average for my system... sometimes its lower sometimes its higher.
On Monday 23 February 2004 20:57, Mark Horton wrote:
Unfortunately I don't have a solution, but I was wondering about your DDR400 with a 242 CPU. My understanding is that this would run at DDR333. Do you have any idea if you are getting DDR400 speed? I'm not even sure how to determine this. I have used hdparm -T for some sort of rough estimate. I'd be curious what your results are.
Here's mine with a dual 240 DDR333 system:
hdparm -T /dev/md0
/dev/md0: Timing buffer-cache reads: 2680 MB in 2.00 seconds = 1339.33 MB/sec
My Thunder K8W with 2 x 240 and DDR400; Hitachi SATA Timing buffer-cache reads: 3056 MB in 2.00 seconds = 1527.47 MB/sec Timing buffered disk reads: 176 MB in 3.03 seconds = 58.08 MB/sec In the first boot screen the BIOS display the RAM speed. In AMD pdf about the Opteron CPU 23932.pdf you will find a table on page 14, Table 2. DRAM Interface Speed vs. CPU Core Clock Multiplier You will see a 240 runs it 166 memory only with 155 MHz, so I cannot imagine this was by design. But all CPU's, independend of core frequency, can clock the memory with 200MHz. So I assume, this was designed from beginning. And it runs perfectly. Also the HT bus is multiple by 200MHz Regards Frank
Ah ok. That explains it. After doing a little research on those pdf docs I believe my cpu is revsion B3. I'll have to take the heatsink off to make sure. Does any happen to know if /proc/cpuinfo reveals anything about the cpu revision? It looks like there is either B3 or C0. Mark Frank Pieczynski wrote:
On Monday 23 February 2004 20:57, Mark Horton wrote:
Unfortunately I don't have a solution, but I was wondering about your DDR400 with a 242 CPU. My understanding is that this would run at DDR333. Do you have any idea if you are getting DDR400 speed? I'm not even sure how to determine this. I have used hdparm -T for some sort of rough estimate. I'd be curious what your results are.
Here's mine with a dual 240 DDR333 system:
hdparm -T /dev/md0
/dev/md0: Timing buffer-cache reads: 2680 MB in 2.00 seconds = 1339.33 MB/sec
My Thunder K8W with 2 x 240 and DDR400; Hitachi SATA Timing buffer-cache reads: 3056 MB in 2.00 seconds = 1527.47 MB/sec Timing buffered disk reads: 176 MB in 3.03 seconds = 58.08 MB/sec
In the first boot screen the BIOS display the RAM speed.
In AMD pdf about the Opteron CPU 23932.pdf you will find a table on page 14, Table 2. DRAM Interface Speed vs. CPU Core Clock Multiplier You will see a 240 runs it 166 memory only with 155 MHz, so I cannot imagine this was by design. But all CPU's, independend of core frequency, can clock the memory with 200MHz. So I assume, this was designed from beginning. And it runs perfectly. Also the HT bus is multiple by 200MHz
Regards Frank
Mark Horton writes:
Ah ok. That explains it. After doing a little research on those pdf docs I believe my cpu is revsion B3. I'll have to take the heatsink off to make sure.
Does any happen to know if /proc/cpuinfo reveals anything about the cpu revision? It looks like there is either B3 or C0.
Yes, model and stepping encde the cpu revision - but I always forget which way ;-) Andreas -- Andreas Jaeger, aj@suse.de, http://www.suse.de/~aj SuSE Linux AG, Maxfeldstr. 5, 90409 Nürnberg, Germany GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
participants (7)
-
Andi Kleen
-
Andreas Jaeger
-
Eugene de Villiers
-
Frank Pieczynski
-
Mark Horton
-
Milan Gabor
-
Santiago Flores