System crash on ASUS K8N-DL w/ SuSE 10.0 AMD64
Hello, Is anyone successfully running SuSE 10.0 on ASUS k8n-dl motherboard? I have read input from other sites and have followed suggestions on upgrading the bios and disabling the nv raid controller. Her are some details of my system: motherboard - ASUS K8N-DL (BIOS level 1007) cpus - 2x AMD 246 memory - coarsair 2x 512 (1gb) mb cas2.5 ram drives - 3x western digitial SATA drives - 1x maxtor PATA - 1x ASUS 1608P2 DVD +/- RW / CD -/+ RW (Firmware level ) The system crashes without any errors in the logs. The system just freezes so I have to power reset the system. Any suggestions or ideas are welcome. Thanks Stuart
On Thursday 19 January 2006 22:28, Stuart wrote:
motherboard - ASUS K8N-DL (BIOS level 1007) cpus - 2x AMD 246 memory - coarsair 2x 512 (1gb) mb cas2.5 ram drives - 3x western digitial SATA drives - 1x maxtor PATA - 1x ASUS 1608P2 DVD +/- RW / CD -/+ RW (Firmware level )
The system crashes without any errors in the logs. The system just freezes so I have to power reset the system. Any suggestions or ideas are welcome.
Hi Stuart, These are difficult symptoms, but I would definitely lean towards hardware. Some thoughts: Have you completely exercised the memory for at least a day using memtest86? And, are you *certain* it is the correct rating for that board? Can you rule out any questions in this regard by swapping with known good modules or falling back to one at a time and then the other? Does it freeze on it's own, seemingly randomly (no one at keyboard) or does any user activity (or even a specific activity) trigger it? If the freezes are seemingly random, is there at least some kind of discernable common timeframe?... i.e. minimum uptime of 20 minutes; after 3 hours, etc. ... or can the machine run for days before suddenly freezing? This is a fair amount of hardware for a single power supply so you want to ensure it is sized correctly... you don't want a convergence of simultaneous subsystem activities to momentarily outdraw it's capacity. This is especially true if you've got a high end graphics card with lots of memory to power. (You didn't state your graphics setup.) Have you tried any of the special boot parameters, i.e. failsafe, disabling acpi, etc.? regards, - Carl
Folks, thanks for the replies, To answer a few questions:
Have you completely exercised the memory for at least a day using memtest86? And, are you *certain* it is the correct rating for that board? Can you rule out any questions in this regard by swapping with known good modules or falling back to one at a time and then the other?
I have not run the memtest program but I installed DDR400 ECC Registered memory as specified in the manual, but that is something I can check once I get some additional memory.
Does it freeze on it's own, seemingly randomly (no one at keyboard) or does any user activity (or even a specific activity) trigger it? If the freezes are seemingly random, is there at least some kind of discernable common timeframe?... i.e. minimum uptime of 20 minutes; after 3 hours, etc. ... or can the machine run for days before suddenly freezing?
It seems to freeze on its own as well as when I am using it. I have cron jobs that run to download info. It usually does not freeze when they run, but I have known it to do so at least once. I initially thought it was the mouse, so i switched mice - no change. Sometimes I can use the computer for days without hanging. The only significant telling activity is disk activity. Recenty I have tried to copy large amounts of data from the SATA array to the PATA drive. After a few seconds or sometimes minutes, the system hangs.
What power supply do you have and how big This is a fair amount of hardware for a single power supply so you want to ensure it is sized correctly... you don't want a convergence of simultaneous subsystem activities to momentarily outdraw it's capacity. This is especially true if you've got a high end graphics card with lots of memory to power. (You didn't state your graphics setup.)
I am using a 460W power supply, but I will consider an increase. I have seen failing power supplies cause strange/random activity in a computer before - so I may in fact work on this issue first.
Have you tried any of the special boot parameters, i.e. failsafe, disabling acpi, etc.?
I have disabled ACPI on the board and use noacpi parameter on boot, but the system still tries to start acpi anyway. I have also disabled the NV raid, but linux detects it and loads the nv_sata driver too. I have rmmod'ed it but it does not seem to make a difference since there are no drives attached to the controller.
Does it get hot !!?? How about airflow !! The CPU's run at around 114F and 120F respectively (not sure why there is such a big difference) and system temp is ony 113F according to the bios.
None of the components seem unusually hot. The drives are actually cool to the touch so I am not inclined to think there is a problem with heat. Thanks for the feed back. I will try replacing the power supply first and report back when that is done. Stuart On Friday 20 January 2006 03:34, Carl Hartung wrote:
suse-linux-e@suse.com
Hi again, Stuart. On Friday 20 January 2006 18:13, Stuart wrote:
I have not run the memtest program but I installed DDR400 ECC Registered memory as specified in the manual, but that is something I can check once I get some additional memory. <snip>
If I were in your situation, I'd swap out the memory, first, with known good modules... name brand chips, reputable manufacturer (Corsair is usually excellent,) lifetime and/or 24 hour advance replacement warranty, etc.. You want pieces that have been tested and precisely matched to each other at the factory. In fact, contact Corsair, explain the problem, tell them you believe the symptoms point to the memory and they may be willing to advance replace the set you have with a pair hand tested in advance for your specific application. I'd have expected the system to fall over occassionally during cold boots, too, if it were the power supply. good luck & regards, - Carl
Carl, I am currently using Cosair modules now. I would not suspect them to be bad at this point. On Friday 20 January 2006 19:10, Carl Hartung wrote:
Hi again, Stuart.
If I were in your situation, I'd swap out the memory, first, with known good modules... name brand chips, reputable manufacturer (Corsair is usually excellent,) lifetime and/or 24 hour advance replacement warranty, etc.. You want pieces that have been tested and precisely matched to each other at the factory.
I'd have expected the system to fall over occassionally during cold boots, too, if it were the power supply.
That would make sense too, but Im stumped at this point. I will try the memory modules in different slots. I am seeing something in the manual about placing the modules in different slots when using only 2 on this board, but I would not have expected them to work in a different configuration. I will try that first. Thanks Stuart
On Friday 20 January 2006 23:36, Stuart wrote:
I am currently using Cosair modules now. I would not suspect them to be bad at this point.
Why not? In case you've missed my past posts on SLE discussing memory, I was in that business in Silicon Valley for a few years (in manufacturing, before it went offshore.) From the symptoms you've described, those modules would be my /first/ guess. Testing each individual module at the factory, beyond pass/fail for basic PCB trace continuity, isn't cost-effective. Instead, they pull random samples from each batch to test a bit more stringently with a memory tester. But, still, what they're trying to detect are problematic /batches/ -- not individual "duds." What this means for you, as a consumer, is you will sometimes experience a DOA or marginal module straight from the factory that is brand spanking new. This is just a fact of life. The only way to avoid this is to pay the (usually very steep) premium for top-of-the-line pre-tested, pre-matched pairs of name brand modules that come with 24 hour advance replacement warranties. If you factor in the value of your time spent running around in circles chasing down an intermittent freeze in a high end system, you'll see it is actually worth the extra money. Note this isn't something you'd do for a midrange or slightly older system, but it makes sense when the system you're building is right at the cutting edge, performance wise. The key symptoms I'm using to arrive at my diagnosis are: 1. Failure to stumble or fall completely over on cold boot. That is the most vulnerable time for a system with a failing, undersized or marginal power supply. 2. The system freezes are truly random and are leaving no clues in the logs. This is certainly a suddenly fatal, hardware based error. These are the hallmarks of a memory problem. 3. The system can be encouraged to freeze by loading it down with an I/O intensive task, like a (buffered) disk to disk transfer... probably one of the best in-situ memory stress tests available, next to something graphics intensive. BTW, did you take basic ESD precautions when installing the modules? - Carl
Carl, OK, you may have me on this point. At the time I built this machine I was in a new home without the proper setup, but I took what I thought were necessary precautions. I put the system together on a glass surface with the board placed on the anti-static pad that comes with it and tried to discharge any static from myself by touching the metal legs of the table before handling the parts. At the time I thought this would have been enough, but given these components you may be right. I also ran memtest since your suggestion, and it did not show any errors. The only issue I had with the test is that it seemed to not test ECC, even after I selected the options to do so. Any suggestions? Thanks again Stuart
The key symptoms I'm using to arrive at my diagnosis are:
1. Failure to stumble or fall completely over on cold boot. That is the most vulnerable time for a system with a failing, undersized or marginal power supply.
2. The system freezes are truly random and are leaving no clues in the logs. This is certainly a suddenly fatal, hardware based error. These are the hallmarks of a memory problem.
3. The system can be encouraged to freeze by loading it down with an I/O intensive task, like a (buffered) disk to disk transfer... probably one of the best in-situ memory stress tests available, next to something graphics intensive.
BTW, did you take basic ESD precautions when installing the modules?
- Carl
Folks, The freeze consistently occurs transferring large amounts of data from my SATA drives formated as LVM to the PATA drive. I just switched the memory modules to the slots recommended in the manual but system still freezes! Stuart
----- Original Message ---- From: Stuart
To: suse-linux-e@suse.com Sent: Thursday, January 19, 2006 10:28:52 PM Subject: [SLE] System crash on ASUS K8N-DL w/ SuSE 10.0 AMD64 Hello,
Is anyone successfully running SuSE 10.0 on ASUS k8n-dl motherboard?
I am :)
I have read input from other sites and have followed suggestions on upgrading the bios and disabling the nv raid controller.
After BIOS level 1004 NV Raid controller works fine under linux (I would say even better that Slilicon Image). I do not use it though - configured the system way before they had fixed bugs in their BIOS
Her are some details of my system:
motherboard - ASUS K8N-DL (BIOS level 1007) cpus - 2x AMD 246 memory - coarsair 2x 512 (1gb) mb cas2.5 ram drives - 3x western digitial SATA drives - 1x maxtor PATA - 1x ASUS 1608P2 DVD +/- RW / CD -/+ RW (Firmware level )
Mine is somewhat similar. You did not mention the video card. It may make a difference. Do you use any UPS monitoring daemons? (apcupsd for example) Are you running a 32 bit version or 64 bit?
The system crashes without any errors in the logs. The system just freezes so I have to power reset the system.
So does it crash or freeze?
Any suggestions or ideas are welcome.
My guess would be it's either a power supply (depending on your video card you would probably need at least 450W stable) or just overheating. Did you try lm_sensors?
Thanks
Stuart
Dmitry -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
Dimych,
After BIOS level 1004 NV Raid controller works fine under linux (I would say even better that Slilicon Image). I do not use it though - configured the system way before they had fixed bugs in their BIOS
This is interesting, I upgraded to BIOS level 1007, but still got erratic NV raid behavior. Stuart
Dimych, Do you, by any chance have sound working for this board? If so can you tell me the secret? I SUSE 10.0 x86_64 loads the snd_intel8x0 module, but I have not gotten any sound from it. I also tried to compile and load the nvsound module but I did not get that to work either. Thanks, On Friday 20 January 2006 08:46, Dimych wrote:
----- Original Message ---- From: Stuart
To: suse-linux-e@suse.com Sent: Thursday, January 19, 2006 10:28:52 PM Subject: [SLE] System crash on ASUS K8N-DL w/ SuSE 10.0 AMD64 Hello,
Is anyone successfully running SuSE 10.0 on ASUS k8n-dl motherboard?
I am :)
I have read input from other sites and have followed suggestions on upgrading the bios and disabling the nv raid controller.
After BIOS level 1004 NV Raid controller works fine under linux (I would say even better that Slilicon Image). I do not use it though - configured the system way before they had fixed bugs in their BIOS
Her are some details of my system:
motherboard - ASUS K8N-DL (BIOS level 1007) cpus - 2x AMD 246 memory - coarsair 2x 512 (1gb) mb cas2.5 ram drives - 3x western digitial SATA drives - 1x maxtor PATA - 1x ASUS 1608P2 DVD +/- RW / CD -/+ RW (Firmware level )
Mine is somewhat similar. You did not mention the video card. It may make a difference. Do you use any UPS monitoring daemons? (apcupsd for example) Are you running a 32 bit version or 64 bit?
The system crashes without any errors in the logs. The system just freezes so I have to power reset the system.
So does it crash or freeze?
Any suggestions or ideas are welcome.
My guess would be it's either a power supply (depending on your video card you would probably need at least 450W stable) or just overheating. Did you try lm_sensors?
Thanks
Stuart
Dmitry
-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
If you can, try turning off the acpid daemon. The only way I have been sucessful is to reinstall the OS and when it boots up after the installation, I go directly into the Yast and dsiable acpid. You can also try booting with noapcid. Stuart wrote:
Hello,
Is anyone successfully running SuSE 10.0 on ASUS k8n-dl motherboard? I have read input from other sites and have followed suggestions on upgrading the bios and disabling the nv raid controller. Her are some details of my system:
motherboard - ASUS K8N-DL (BIOS level 1007) cpus - 2x AMD 246 memory - coarsair 2x 512 (1gb) mb cas2.5 ram drives - 3x western digitial SATA drives - 1x maxtor PATA - 1x ASUS 1608P2 DVD +/- RW / CD -/+ RW (Firmware level )
The system crashes without any errors in the logs. The system just freezes so I have to power reset the system. Any suggestions or ideas are welcome.
Thanks
Stuart
-- Joseph Loo jloo@acm.org
Joseph, Thanks, I didn't reinstall, but disabled acpid in the runlevel options. Stuart On Friday 20 January 2006 21:20, Joseph Loo wrote:
If you can, try turning off the acpid daemon. The only way I have been sucessful is to reinstall the OS and when it boots up after the installation, I go directly into the Yast and dsiable acpid. You can also try booting with noapcid.
participants (4)
-
Carl Hartung
-
Dimych
-
Joseph Loo
-
Stuart