[opensuse] Server Freezes Randomly

For some reason, my server that I have running openSUSE 10.2 has been freezing randomly. It just completely locks up and the only way i can get it to work again is by doing a hard reset. I have no idea what is causing this to happen, because it had been working perfectly before. One change I did notice recently, however, is that one of my cron jobs that is supposed to connect to a ftp server and download all the files in a directory stopped working. The only output of the cronjob is this: "ftp: Name or service not known". Also, I don't know if this has anything to do with the problem, but I recently setup sendmail correctly so that my cronjobs could email me the results. I am thoroughly confused. -- Brandon Carl Spleeyah@spleeyah.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

On Friday 17 August 2007 16:10, Brandon Carl wrote:
Don't be. When I've had problems like this, it almost always turned out to be hardware. I'd check memory first. Let memtest86 run for a while on it. If that turns out OK, my next suggestion would be the power supply. Unless you have a way to test it, putting another one in is about the easiest solution to see it the lockups go away. It could also be a drive going bad. Just random thoughts.. Mike -- Powered by SuSE 10.0 Kernel 2.6.13 X86_64 KDE 3.4 Kmail 1.8 4:42pm up 1 day 20:15, 4 users, load average: 2.12, 2.11, 2.14 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Don't be. When I've had problems like this, it almost always turned out to be hardware. I'd check memory first. Let memtest86 run for a while
I'm not familiar with memtest86, how would I run it?
on it. If that turns out OK, my next suggestion would be the power supply. Unless you have a way to test it, putting another one in is
I am pretty sure it is not the power supply, because it has been running worry free for a while now.
about the easiest solution to see it the lockups go away. It could also be a drive going bad.
I have a software RAID 1 array if this is the case, which I sure hope it isn't. When I hard reset the machine after lockup, it "replays transactions," which I think, from what I've heard, means that the hard drive was trying to written to, or something along those lines. It usually averages about 700 transactions "replayed" after a hard reset.
My system already boots with the noapic command normally. The only other thing I could think of was maybe it was overheating, so I clocked the processor down a little bit. It is a sempron 2800+ machine running at 2.0 Ghz. (Stock 1.7 Ghz). The airflow on the machine isn't the best, but it hasn't choked out before. Could this be a valid reason for it to lock up? Also, are there any logs or anything I can look at to tell me the source of the problem? It decides randomly to lock up, with no particular pattern. The only thing I can think of that has been consistent is that the HDD light on the case is completely off whenever it freezes. Thanks for your help! -- Brandon -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Just to add a little here. I had a system that was running with no problems for a long time. Then I started getting odd failures... like drives dropping off.... strange behaviour I couldn't pinpoint. Turned out it was a power supply problem. Replaced the PSU and things are back to normal. While your problem may not be PSU related, don't discount a PSU issue simply because it has been working fine for a long time. PSUs do fail, and not always catastrophically.. sometimes they just begin to fade away.. no longer delivering the full rated power.
Definitely. Heat is an issue. Something as simple as a collection of dustbunnies on your heat sinks and fan blades can dramatically reduce the cooling efficiency of your computer - especially if your cooling situation is already less than optimal. Make sure that it is all clean inside.... Start up sensors and monitor things for a while. You can either run it from the command line or if you have a GUI, you can use gtkrellm or Superkaramba to put a realtime monitor on the desktop. C -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

[...]
Is there any way I would be able to tell that this is the problem without having to just test out a new PSU to see if it works?
I'll clean it out and see if it helps.
I have System Monitor and xosview running, but they don't really tell me much. Is there a monitor for temperature? -- Brandon -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brandon Carl wrote:
Had a machine with similar problems. Turned out in the end to be due in part to a hard drive that was experiencing some data failure, and possibly (the juries out on this) a failed tape unit. (As it is possible the data corruption issues on the drive was causing problems on with the tape unit). BTW I still have the drive on the system and it is still mounted as a data drive for data recovery purposes but now it is no longer under load I have experienced no problems. I would load S.M.A.R.T and use it to monitor the status of the drives. A problem may not show up immediately but you have a good chance of getting something meaningful reported eventually (it took two months in my case). (There are other tools which monitor the status of processors etc but what is available varies according to chipsets). I would hope that your machine is suitably protected against power oddities. Faulty PSUs are rather difficult to diagnose, especially if the problem is intermittent. Heat related problems tend to go away only when the device has cooled down and typically it may be some time before the devices restarts sanely. (You can also get a lot of very strange behaviour just before a heat related shutdown). - -- ============================================================================== I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. Bjarne Stroustrup ============================================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFGxqxTasN0sSnLmgIRAiwNAJ49/lFSezUcljNKDPGGeesXyGFnnwCgnXS3 sm0R2hDWeeFfcijsDyEI2Ew= =1hMw -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Brandon Carl wrote:
Download the Ultimate Boot CD, burn it, boot from it, and select Memtest. You can find it here: http://www.ultimatebootcd.com
Doesn't matter. Everything dies eventually:-) Start by cleaning the case, though. Get one of those cans of compressed air and clean it out nice. It really made a difference for me.
Absolutely. Try turning off the overclocking to see what happens.
Also, are there any logs or anything I can look at to tell me the source of the problem?
Probably not. If it is a hardware problem (heat, power supply, air flow), there won't be a nice log entry to help. That is one reason why it sure sounds to me like a hardware problem. -- Jonathan Arnold (mailto:jdarnold@buddydog.org) LinuxBrainDump, Linux HowTo's and Tutorials: http://www.linuxbrainddump.org Daemon Dancing in the Dark, an Open OS weblog: http://freebsd.amazingdev.com/blog/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

On Saturday 18 August 2007 05:30, Jonathan Arnold wrote:
It's been included in all the SusE Linux and openSUSE boot / install discs for at least three releases. UBCD is good, but unnecessary for this purpose.
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Luckily i already have a copy, I'll run the test if I can schedule some downtime for the server. [...]
Absolutely. Try turning off the overclocking to see what happens.
Surprisingly, I clocked it down from 2.2 Ghz to 2.0 Ghz and it has been running fine for the past 12 hours or so. -- Brandon -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Luckily i already have a copy, I'll run the test if I can schedule some downtime for the server. [...]
Absolutely. Try turning off the overclocking to see what happens.
Surprisingly, I clocked it down from 2.2 Ghz to 2.0 Ghz and it has been running fine for the past 12 hours or so. -- Brandon -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Luckily i already have a copy, I'll run the test if I can schedule some downtime for the server. [...]
Absolutely. Try turning off the overclocking to see what happens.
Surprisingly, I clocked it down from 2.2 Ghz to 2.0 Ghz and it has been running fine for the past 12 hours or so. -- Brandon -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

On Saturday 18 August 2007 13:13, Brandon Carl wrote:
If this is a 'server' and it is important I would think that 'overclocking' would cause instability as you would be pushing the limit of the hardware and would most likely cause it to freeze or crash, kinda counterproductive IMHO. Mike -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Sat, 18 Aug 2007, by jdarnold@buddydog.org:
UBCD has many usefull programs, so it's definitely worth having, but if OP just wants to use memtest86 then he can get it directly from http://www.memtest86.com/memtest86-3.2.tar.gz Unpack this file and follow the directions in the README file to make a bootable memtest86 floppy (make; make install). Theo -- Theo v. Werkhoven Registered Linux user# 99872 http://counter.li.org ICBM 52 13 26N , 4 29 47E. + ICQ: 277217131 SUSE 10.2 + Jabber: muadib@jabber.xs4all.nl Kernel 2.6.20 + See headers for PGP/GPG info. Claimer: any email I receive will become my property. Disclaimers do not apply. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

On Saturday 18 August 2007 13:51, Theo v. Werkhoven wrote:
I'm tellin' you, it's on the SuSE installation disc.
Theo
RRS -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Randall R Schulz wrote:
And it can be installed using Yast and then add it as an "image" to Grub. -- Use OpenOffice.org <http://www.openoffice.org> -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

On Saturday 18 August 2007 15:09, James Knott wrote:
That, I didn't know. I've always just booted the latest installer I have when I want to run a memory test. RRS -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Randall R Schulz wrote:
I add it to the Grub menu on all my SUSE systems. -- Use OpenOffice.org <http://www.openoffice.org> -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

On Friday 17 August 2007 16:10, Brandon Carl wrote:
Don't be. When I've had problems like this, it almost always turned out to be hardware. I'd check memory first. Let memtest86 run for a while on it. If that turns out OK, my next suggestion would be the power supply. Unless you have a way to test it, putting another one in is about the easiest solution to see it the lockups go away. It could also be a drive going bad. Just random thoughts.. Mike -- Powered by SuSE 10.0 Kernel 2.6.13 X86_64 KDE 3.4 Kmail 1.8 4:42pm up 1 day 20:15, 4 users, load average: 2.12, 2.11, 2.14 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Don't be. When I've had problems like this, it almost always turned out to be hardware. I'd check memory first. Let memtest86 run for a while
I'm not familiar with memtest86, how would I run it?
on it. If that turns out OK, my next suggestion would be the power supply. Unless you have a way to test it, putting another one in is
I am pretty sure it is not the power supply, because it has been running worry free for a while now.
about the easiest solution to see it the lockups go away. It could also be a drive going bad.
I have a software RAID 1 array if this is the case, which I sure hope it isn't. When I hard reset the machine after lockup, it "replays transactions," which I think, from what I've heard, means that the hard drive was trying to written to, or something along those lines. It usually averages about 700 transactions "replayed" after a hard reset.
My system already boots with the noapic command normally. The only other thing I could think of was maybe it was overheating, so I clocked the processor down a little bit. It is a sempron 2800+ machine running at 2.0 Ghz. (Stock 1.7 Ghz). The airflow on the machine isn't the best, but it hasn't choked out before. Could this be a valid reason for it to lock up? Also, are there any logs or anything I can look at to tell me the source of the problem? It decides randomly to lock up, with no particular pattern. The only thing I can think of that has been consistent is that the HDD light on the case is completely off whenever it freezes. Thanks for your help! -- Brandon -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Just to add a little here. I had a system that was running with no problems for a long time. Then I started getting odd failures... like drives dropping off.... strange behaviour I couldn't pinpoint. Turned out it was a power supply problem. Replaced the PSU and things are back to normal. While your problem may not be PSU related, don't discount a PSU issue simply because it has been working fine for a long time. PSUs do fail, and not always catastrophically.. sometimes they just begin to fade away.. no longer delivering the full rated power.
Definitely. Heat is an issue. Something as simple as a collection of dustbunnies on your heat sinks and fan blades can dramatically reduce the cooling efficiency of your computer - especially if your cooling situation is already less than optimal. Make sure that it is all clean inside.... Start up sensors and monitor things for a while. You can either run it from the command line or if you have a GUI, you can use gtkrellm or Superkaramba to put a realtime monitor on the desktop. C -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

[...]
Is there any way I would be able to tell that this is the problem without having to just test out a new PSU to see if it works?
I'll clean it out and see if it helps.
I have System Monitor and xosview running, but they don't really tell me much. Is there a monitor for temperature? -- Brandon -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brandon Carl wrote:
Had a machine with similar problems. Turned out in the end to be due in part to a hard drive that was experiencing some data failure, and possibly (the juries out on this) a failed tape unit. (As it is possible the data corruption issues on the drive was causing problems on with the tape unit). BTW I still have the drive on the system and it is still mounted as a data drive for data recovery purposes but now it is no longer under load I have experienced no problems. I would load S.M.A.R.T and use it to monitor the status of the drives. A problem may not show up immediately but you have a good chance of getting something meaningful reported eventually (it took two months in my case). (There are other tools which monitor the status of processors etc but what is available varies according to chipsets). I would hope that your machine is suitably protected against power oddities. Faulty PSUs are rather difficult to diagnose, especially if the problem is intermittent. Heat related problems tend to go away only when the device has cooled down and typically it may be some time before the devices restarts sanely. (You can also get a lot of very strange behaviour just before a heat related shutdown). - -- ============================================================================== I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. Bjarne Stroustrup ============================================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFGxqxTasN0sSnLmgIRAiwNAJ49/lFSezUcljNKDPGGeesXyGFnnwCgnXS3 sm0R2hDWeeFfcijsDyEI2Ew= =1hMw -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (10)
-
Brandon Carl
-
Clayton
-
David C. Rankin
-
G T Smith
-
James Knott
-
Jonathan Arnold
-
ka1ifq
-
Mike
-
Randall R Schulz
-
Theo v. Werkhoven