Please help: SuSE 9.1 lockups during high CPU/disk activity
Hello all, Lately I've experienced frequent lockups of my SuSE 9.1 installation during high CPU/disk activity (ripping a DVD to avi or compiling a large program will do the trick). The system will freeze, output to the monitor disabled, and the HD and CD-ROM LED will be burning constantly. I've already replaced the power supply (now using a 480W Tagan), and added an extra fan, but that didn't help too much. I've also run a memory check already: all clean. The CPU temperature after direct reboot is 52C, and the system temperature is measured at around 34C. Could overheating of a HD or CD-ROM player cause such lockups? Is there another explanation or a way for me to locate the problem? Regards, Pieter Hulshoff
Pieter Hulshoff wrote:
Hello all,
Lately I've experienced frequent lockups of my SuSE 9.1 installation during high CPU/disk activity (ripping a DVD to avi or compiling a large program will do the trick). The system will freeze, output to the monitor disabled, and the HD and CD-ROM LED will be burning constantly. I've already replaced the power supply (now using a 480W Tagan), and added an extra fan, but that didn't help too much. I've also run a memory check already: all clean.
The CPU temperature after direct reboot is 52C, and the system temperature is measured at around 34C. Could overheating of a HD or CD-ROM player cause such lockups? Is there another explanation or a way for me to locate the problem?
Regards,
Pieter Hulshoff
Well, I guess you have a cooling problem. I would say your CPU-fan is most likely not working properly anymore. Or another fan is damaged so that the heat can kind of "pile up" inside your box? Martin
On Friday 22 April 2005 17:44, Martin Deppe wrote:
Pieter Hulshoff wrote:
Hello all,
Lately I've experienced frequent lockups of my SuSE 9.1 installation during high CPU/disk activity (ripping a DVD to avi or compiling a large program will do the trick). The system will freeze, output to the monitor disabled, and the HD and CD-ROM LED will be burning constantly. I've already replaced the power supply (now using a 480W Tagan), and added an extra fan, but that didn't help too much. I've also run a memory check already: all clean.
The CPU temperature after direct reboot is 52C, and the system temperature is measured at around 34C. Could overheating of a HD or CD-ROM player cause such lockups? Is there another explanation or a way for me to locate the problem?
Regards,
Pieter Hulshoff
Well, I guess you have a cooling problem. I would say your CPU-fan is most likely not working properly anymore. Or another fan is damaged so that the heat can kind of "pile up" inside your box?
I checked my CPU-fan, and it's clean and appears to be working. All the other fans appear to be working fine as well. Also, the CPU temperature after direct reboot after a lock-up is about 52C. The HDs could be suffering from a heat problem, since I got 5 of them in a medium chassis (those wires may be blocking the airflow), but I don't know why that would cause the motherboard to turn itself off (the LED indicating what condition the motherboard is in is turned off during the lock-up). Any info you could give me on this? Regards, Pieter
Pieter Hulshoff wrote:
Lately I've experienced frequent lockups of my SuSE 9.1 installation during high CPU/disk activity (ripping a DVD to avi or compiling a large program will do the trick). The system will freeze, output to the monitor disabled, and the HD and CD-ROM LED will be burning constantly. I've already replaced the power supply (now using a 480W Tagan), and added an extra fan, but that didn't help too much. I've also run a memory check already: all clean.
I would take a good look at the ide driver and the mainboard. If it's a via kt133 (or something like that) I remember it has a bug causing a crash if the system is under high load and you use the ide drive extensively. If possible please change the mainboard or use a separate ide controller. Sandy
On Friday 22 April 2005 20:00, Sandy Drobic wrote:
Pieter Hulshoff wrote:
Lately I've experienced frequent lockups of my SuSE 9.1 installation during high CPU/disk activity (ripping a DVD to avi or compiling a large program will do the trick). The system will freeze, output to the monitor disabled, and the HD and CD-ROM LED will be burning constantly. I've already replaced the power supply (now using a 480W Tagan), and added an extra fan, but that didn't help too much. I've also run a memory check already: all clean.
I would take a good look at the ide driver and the mainboard. If it's a via kt133 (or something like that) I remember it has a bug causing a crash if the system is under high load and you use the ide drive extensively. If possible please change the mainboard or use a separate ide controller.
It's an EPoX 9K9A mainboard with a kt400 chipset, and it's not been given me any trouble until recently. Any link to this problem you described though? Any chance it may have been activated through a SuSE update? Regards, Pieter
On Friday 22 April 2005 18:31, Pieter Hulshoff wrote:
I checked my CPU-fan, and it's clean and appears to be working. All the other fans appear to be working fine as well. Also, the CPU temperature after direct reboot after a lock-up is about 52C. The HDs could be suffering from a heat problem, since I got 5 of them in a medium chassis (those wires may be blocking the airflow), but I don't know why that would cause the motherboard to turn itself off (the LED indicating what condition the motherboard is in is turned off during the lock-up). Any info you could give me on this?
Might the values in /etc/sensors.conf be of any use? If so, what should I be looking for? Regards, Pieter
Pieter Hulshoff wrote:
On Friday 22 April 2005 17:44, Martin Deppe wrote:
Well, I guess you have a cooling problem. I would say your CPU-fan is most likely not working properly anymore. Or another fan is damaged so that the heat can kind of "pile up" inside your box?
I checked my CPU-fan, and it's clean and appears to be working. All the other fans appear to be working fine as well. Also, the CPU temperature after direct reboot after a lock-up is about 52C. The HDs could be suffering from a heat problem, since I got 5 of them in a medium chassis (those wires may be blocking the airflow), but I don't know why that would cause the motherboard to turn itself off (the LED indicating what condition the motherboard is in is turned off during the lock-up). Any info you could give me on this?
Regards,
Pieter
I'm afraid, it might be your Power Supply. When did you add the last drive(s)? Did these incidents come up afterwards? I had that problem when I put a third and a fourth disk into my system. It caused the strangest things and I wouldn't be surprised if it would be the same in your case! Good luck Martin
On Friday 22 April 2005 23:54, Martin Deppe wrote:
I'm afraid, it might be your Power Supply. When did you add the last drive(s)? Did these incidents come up afterwards?
I'm not so sure about that. I added drives in December, and the problems started in March. I've already replaced my 350W PSU with a new Tagan 480W PSU last night, replaced my Northbridge fan, and added another chassis fan. It's just a mess around the drive area, since I got 5 HDs, a CD-burner, and a DVD player in this medium chassis. Regards, Pieter
Pieter Hulshoff wrote:
On Friday 22 April 2005 23:54, Martin Deppe wrote:
I'm afraid, it might be your Power Supply. When did you add the last drive(s)? Did these incidents come up afterwards?
I'm not so sure about that. I added drives in December, and the problems started in March. I've already replaced my 350W PSU with a new Tagan 480W PSU last night, replaced my Northbridge fan, and added another chassis fan. It's just a mess around the drive area, since I got 5 HDs, a CD-burner, and a DVD player in this medium chassis.
Regards,
Pieter
Maybe it is a combination of the heat problem and a too low voltage problem which only leads to the lock up if the temperature raises above a certain point so that the voltage goes even further down to the point that it finally causes the lock. I don't actually know whether you can use two or more PSU in one and the same box (I still have to check that for myself), but if this is possible it might be part of the solution beside the point to get the air better circulating inside your box for better cooling characteristics. In case you would try to put an additional PSU inside your box and it works out please let me know, because I would definitely go for that too (pretty much earlier than without that experience), since I have another two disk drives too which I would like to use in my machine. Well, at least good luck Martin
On Saturday 23 April 2005 00:16, Martin Deppe wrote:
Maybe it is a combination of the heat problem and a too low voltage problem which only leads to the lock up if the temperature raises above a certain point so that the voltage goes even further down to the point that it finally causes the lock.
I think it's mostly heat. I've currently got the door open, it's freezing in here, and my computer's rock stabily ripping a DVD to avi. CPU temperature is at 48C right now. If a Tagan 480W PSU (and one with very good reviews I might add) isn't capable of handling a 2 year old system, Athlon XP2000+ with radeon 9000 AIW card, then there's a serious problem I think. :)
I don't actually know whether you can use two or more PSU in one and the same box (I still have to check that for myself), but if this is possible it might be part of the solution beside the point to get the air better circulating inside your box for better cooling characteristics.
Using two or more PSUs is not a good thing. The voltages of those two PSUs will never be exactly the same, so you get internal power leakage (don't know the exact term for it in English I'm afraid). I wouldn't recommend it. As said though: a good 480W PSU should be more than enough for a system like mine. Regards, Pieter
Pieter Hulshoff wrote:
On Saturday 23 April 2005 00:16, Martin Deppe wrote:
Maybe it is a combination of the heat problem and a too low voltage problem which only leads to the lock up if the temperature raises above a certain point so that the voltage goes even further down to the point that it finally causes the lock.
I think it's mostly heat. I've currently got the door open, it's freezing in here, and my computer's rock stabily ripping a DVD to avi. CPU temperature is at 48C right now. If a Tagan 480W PSU (and one with very good reviews I might add) isn't capable of handling a 2 year old system, Athlon XP2000+ with radeon 9000 AIW card, then there's a serious problem I think. :)
I don't actually know whether you can use two or more PSU in one and the same box (I still have to check that for myself), but if this is possible it might be part of the solution beside the point to get the air better circulating inside your box for better cooling characteristics.
Using two or more PSUs is not a good thing. The voltages of those two PSUs will never be exactly the same, so you get internal power leakage (don't know the exact term for it in English I'm afraid). I wouldn't recommend it. As said though: a good 480W PSU should be more than enough for a system like mine.
Regards,
Pieter
I meant to use another PSU with the additional drives. Since they are connected only through the datacables it doesn't really matter that the voltages aren't exactly the same (I think). I think I will try it (or at least research on that) myself some time. I have a system with two CPU's which are 76°C to 77°C "warm" (always), if I can trust gkrellm and actually the BIOS too. So 48°C shouldn't be a problem at your side (I guess). However, I need to sleep now! Good luck Martin
I meant to use another PSU with the additional drives. Since they are connected only through the datacables it doesn't really matter that the voltages aren't exactly the same (I think). I think I will try it (or at least research on that) myself some time.
You may still have current running through the IDE cable I'd think. If the 5V on the drive, and the 5V on the cable are not the same...
I have a system with two CPU's which are 76°C to 77°C "warm" (always), if I can trust gkrellm and actually the BIOS too. So 48°C shouldn't be a problem at your side (I guess).
Yeah, that's what's really bothering me. Even my BIOS tells me the temperature isn't above 55C when I reboot right after a lockup, so I think I can trust gkrellm. It's possible that the rising temperatures outside have pushed things over a limit I didn't encounter during the winter, but then still 55C is ridiculously low. Perhaps it really is the temperature of the drives, which unfortunately I cannot measure. Regards, Pieter
Pieter: El Vie 22 Abr 2005 10:13, Pieter Hulshoff escribió:
Lately I've experienced frequent lockups of my SuSE 9.1 installation during high CPU/disk activity (ripping a DVD to avi or compiling a large program will do the trick). The system will freeze, output to the monitor disabled, and the HD and CD-ROM LED will be burning constantly. I've already replaced the power supply (now using a 480W Tagan), and added an extra fan, but that didn't help too much. I've also run a memory check already: all clean.
I have been experiencing exactly the same behaviour on a SuSE 8.2 box, after applying the latest kernel update (2.4.20-131, installed 2005-03-26). Have you by any means updated your kernel lately? From a post on this list I learned to turn off memory overcommitting: 'echo "2" > /proc/sys/vm/overcommit_memory' (for a running system), and put 'vm.overcommit_memory = 2' in /etc/sysctl.conf to apply the same configuration at a later reboot. After near daily system hangs I applied this modification last week, and my current uptime is 8 days and counting - it seems as if this did the trick, so you might give it a try. Regards, -- Andreas Philipp Noema Ltda. Bogotá, D.C. - Colombia http://www.noemasol.com
On Friday 22 April 2005 20:29, Andreas Philipp wrote:
From a post on this list I learned to turn off memory overcommitting: 'echo "2" > /proc/sys/vm/overcommit_memory' (for a running system), and put 'vm.overcommit_memory = 2' in /etc/sysctl.conf to apply the same configuration at a later reboot. After near daily system hangs I applied this modification last week, and my current uptime is 8 days and counting - it seems as if this did the trick, so you might give it a try.
Thanx for the tip, but I'm afraid I already tried that. :( I saw this particular post earlier today, and already gave it a try. I'm really suspecting a temperature issue. Earlier, after a crash, I immediately rebooted, and it crashed on me during startup. Is there any way to monitor the temperature in my system while it's running? I'm sure there are sensors somewhere... Regards, Pieter
I'm really suspecting a temperature issue. Earlier, after a crash, I immediately rebooted, and it crashed on me during startup. Is there any way to monitor the temperature in my system while it's running? I'm sure there are sensors somewhere...
Regards,
Pieter
Load and run sensors-detect, follow its instructions and then load and run gkrellm. Also get the smartmontools loaded to see if your hard drives are reporting anything. Make sure the BIOS has SMART monitoring enabled if necessary. These will tell you a lot IF your mainboard and hard drives support these reporting tools. Can you rearrange your hard drives to pute a little more space between them? Can you do anything with the case t allow more air flow around the drives? Stan
On Friday 22 April 2005 22:27, Stan Glasoe wrote:
I'm really suspecting a temperature issue. Earlier, after a crash, I immediately rebooted, and it crashed on me during startup. Is there any way to monitor the temperature in my system while it's running? I'm sure there are sensors somewhere...
Load and run sensors-detect, follow its instructions and then load and run gkrellm. Also get the smartmontools loaded to see if your hard drives are reporting anything. Make sure the BIOS has SMART monitoring enabled if necessary.
Cool! Even if it doesn't give me the answers I'm looking for; this kind of tool is nice to have in any case. I managed to get it to work without too much hassle. :) Ok, back to my problem. It's night here now, and with the door open my system's pretty stable. If I close everything up, the temperature rises. At about 55C for the CPU, 34C for the system it hangs up. I'm not sure if those temperatures have anything to do with the reason it hangs, but those were the last few readings I got. I've checked with smartctl a couple of times, but there don't appear to be any problems.
Can you rearrange your hard drives to pute a little more space between them? Can you do anything with the case t allow more air flow around the drives?
I'm afraid not. I'm currently using a medium chassis with 5 HDs, a CD burner, and a DVD player. The Tagan 480W power also takes up a lot of space, especially with all those cables it comes with. I'm serious considering buying a Lian Li v2100b case to take care of these problems once and for all. Regards, Pieter Hulshoff
While we're on the topic; I noticed something. In my /etc/sensors.conf file, the temp2_over and temp2_hyst (CPU temps) are not given any value. What does it do with these values anyway, and if they're not filled in: what are they set to? Regards, Pieter
On Friday 22 April 2005 4:55 pm, Pieter Hulshoff wrote:
Ok, back to my problem. It's night here now, and with the door open my system's pretty stable. If I close everything up, the temperature rises. At about 55C for the CPU, 34C for the system it hangs up. I'm not sure if those temperatures have anything to do with the reason it hangs, but those were the last few readings I got. What door? Side of the case? Door that covers CD/DVD/floppy area? If leaving it open gives you a reliable system then it is a heat and not enough air flow issue for sure.
Does your BIOS give you temperature readouts? Compare those with what gkrellm is showing you. Close the door let the internal temps get to 53C and 33C then reboot and check the BIOS readouts. If they are higher than gkrellm you can adjust gkrellm to better match the BIOS. Could be its hotter than you think. Double check your hard drives. Do they report temperatures? Do a 'smartctl -a /dev/hda' through whatever for all your drives. Do they report temps?
Can you rearrange your hard drives to pute a little more space between them? Can you do anything with the case t allow more air flow around the drives?
I'm afraid not. I'm currently using a medium chassis with 5 HDs, a CD burner, and a DVD player. The Tagan 480W power also takes up a lot of space, especially with all those cables it comes with. I'm serious considering buying a Lian Li v2100b case to take care of these problems once and for all.
Regards,
Pieter Hulshoff
What about external slot covers for unused CD/DVD/Floppy devices? If there are any pull them out for now and see if more air gets pulled through when the door is closed. Then go buy a bigger case for all these space heaters. Or buy larger and fewer internal drives or go external Firewire or USB 2 with the drives you have. Stan
On Saturday 23 April 2005 00:45, Stan Glasoe wrote:
What door? Side of the case? Door that covers CD/DVD/floppy area? If leaving it open gives you a reliable system then it is a heat and not enough air flow issue for sure.
No, the outside door. :) It's cold outside, which brings down the room temperature, and since I already opened the side of the case (it's a tad more stable that way, but not much) it helps keep the temperature down quite well. No lockup sofar...
Does your BIOS give you temperature readouts? Compare those with what gkrellm is showing you. Close the door let the internal temps get to 53C and 33C then reboot and check the BIOS readouts. If they are higher than gkrellm you can adjust gkrellm to better match the BIOS. Could be its hotter than you think.
Yes, it does. I rebooted right after checking the temperature values with gkrellm, and the BIOS gave me about the same values (1 degree lower I believe, but it did take some time to reboot as well).
Double check your hard drives. Do they report temperatures? Do a 'smartctl -a /dev/hda' through whatever for all your drives. Do they report temps?
I'm afraid not. Perhaps I should consider buying a brand other than Maxtor? Which drives do report temperatures?
Then go buy a bigger case for all these space heaters. Or buy larger and fewer internal drives or go external Firewire or USB 2 with the drives you have.
The Lian Li I mentioned has 12 HD spaces, and 7 spaces for CD/DVD players. It's a big tower, with separate compartments, and several large fans installed. The HDs go in sideways (easy to install, and keeps the cables out the way), and have a 120mm fan in front of them, with the PSU in the compartment behind it, and space for 2 more 80mm fans. I think that should help cool things down. :) Regards, Pieter
Pieter:
Which drives do report temperatures?
My Seagte ATA drives do: Apr 22 15:51:06 orchid smartd[536]: Device: /dev/hda, SMART Usage Attribute: 194 Temperature_Celsius changed from 46 to 47 Apr 22 16:51:46 orchid smartd[536]: Device: /dev/hda, SMART Usage Attribute: 194 Temperature_Celsius changed from 47 to 46 (...) Regards, -- Andreas Philipp Noema Ltda. Bogotá, D.C. - Colombia http://www.noemasol.com
On Sat, Apr 23, 2005 at 01:04:06AM +0200, Pieter Hulshoff wrote:
Double check your hard drives. Do they report temperatures? Do a 'smartctl -a /dev/hda' through whatever for all your drives. Do they report temps?
I'm afraid not. Perhaps I should consider buying a brand other than Maxtor? Which drives do report temperatures?
Maxtor 6Y160P0 and 6Y080L0 report temperature here with smartctl just fine: === START OF INFORMATION SECTION === Device Model: Maxtor 6Y160P0 ... 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 27 === START OF INFORMATION SECTION === Device Model: Maxtor 6Y080L0 ... 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 35 -Kastus
On Saturday 23 April 2005 06:11, Kastus wrote:
On Sat, Apr 23, 2005 at 01:04:06AM +0200, Pieter Hulshoff wrote:
Double check your hard drives. Do they report temperatures? Do a 'smartctl -a /dev/hda' through whatever for all your drives. Do they report temps?
I'm afraid not. Perhaps I should consider buying a brand other than Maxtor? Which drives do report temperatures?
Maxtor 6Y160P0 and 6Y080L0 report temperature here with smartctl just fine:
I stand corrected. I wonder how asleep I must have been when I wrote that... /dev/hda: 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 51 /dev/hdb: 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 45 /dev/hdc: 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 52 /dev/hdd: 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 47 /dev/hde: 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 42 CPU temp: 51C System temp: 34C Regards, Pieter
Hmm, how well does GKrellm report voltages? I get the following: Vcorl 1.67 +3.3V 1.54 +12V 12.4 -12V 2.22 -5V 0.29 V5SB 5.57 VBat 3.20 Now some of those values do not sound good to me, but perhaps I'm not interpreting them correctly? That PSU is brand new (installed it last Thursday evening), and I had the same crash problems with the previous PSU as well (though I don't have any values from that one), so I'm hoping I'm reading something incorrectly. Regards, Pieter
Pieter Hulshoff wrote:
Hmm, how well does GKrellm report voltages? I get the following: Vcorl 1.67 +3.3V 1.54 +12V 12.4 -12V 2.22 -5V 0.29 V5SB 5.57 VBat 3.20
Now some of those values do not sound good to me, but perhaps I'm not interpreting them correctly? That PSU is brand new (installed it last Thursday evening), and I had the same crash problems with the previous PSU as well (though I don't have any values from that one), so I'm hoping I'm reading something incorrectly.
Regards,
Pieter
Yes, that doesn't sound good. I have some similar values: +5V 6.85 (is that ok?) +12V 15.5 (and that?) -5V 5.10 (what about the missing minus sign?) V5SB 6.85 (is this ok?) Vcor1 1.74 / 1.70 (toggling - I don't know what it should be) Vcor2 1.73 / 1.63 (toggling - same as above) +3.3V 3.26 (that sounds good to me!) +5V 4.76 (this should also be ok!) +12V 9.79 (does this mean something?) -12V -12.9 -5V -0.17 / -0.22 (toggling) V5SB 5.56 As you can see, there are some strange values which are being displayed be gkrellm, but since I usually don't have any problems I don't care too much about them. Regards Martin
On Saturday 23 April 2005 11:25, Martin Deppe wrote:
As you can see, there are some strange values which are being displayed be gkrellm, but since I usually don't have any problems I don't care too much about them.
If I'm not mistaken, the values should come from the setup created in /etc/sensors.conf. I guess if things aren't setup properly there, then the values you get don't much make sense. I for one can't imagine my system doing anything useful if the 3.3V really would be 1.54V. I'm sure something's not set up correctly there. Regards, Pieter
On Sat, Apr 23, 2005 at 11:37:53AM +0200, Pieter Hulshoff wrote:
If I'm not mistaken, the values should come from the setup created in /etc/sensors.conf. I guess if things aren't setup properly there, then the values you get don't much make sense. I for one can't imagine my system doing anything useful if the 3.3V really would be 1.54V. I'm sure something's not set up correctly there.
You are right, you have to adjust the multipliers in /etc/sensors.conf For example, for my motherboard which uses via686 chip I had to make the following changes in the section chip "via686a-*": compute "2.0V" 1.02*@ , @/1.02 compute "3.3V" 1.02*@ , @/1.02 compute "5.0V" 1.02*@ , @/1.02 compute "12V" 1.02*@ , @/1.02 set fan1_div 8 ignore fan2 To calculate the coefficients compare the values reported by sensors to the ones reported by BIOS. You can also rename some sensores and/or exclude them from report. So here is my complete chip "via686a-*" section for the Gigabyte GA-6VTXE-A motherboard: chip "via686a-*" label "2.0V" "CPU core" label "2.5V" "+2.5V" ignore "2.5V" label "3.3V" "I/O" label "5.0V" "+5V" label "12V" "+12V" label fan1 "CPU Fan" label fan2 "P/S Fan" label temp2 "SYS Temp" label temp1 "CPU Temp" label temp3 "SBr Temp" ignore temp3 set in0_min 1.4 set in0_max 1.6 set in2_min 3.3 * 0.95 set in2_max 3.3 * 1.05 set in3_min 5.0 * 0.95 set in3_max 5.0 * 1.05 set in4_min 12 * 0.95 set in4_max 12 * 1.05 set temp1_hyst 45 set temp1_over 50 set temp2_hyst 45 set temp2_over 50 set temp3_hyst 60 set temp3_over 65 compute "2.0V" 1.02*@ , @/1.02 compute "3.3V" 1.02*@ , @/1.02 compute "5.0V" 1.02*@ , @/1.02 compute "12V" 1.02*@ , @/1.02 set fan1_div 8 ignore fan2 To make sensors recognize new settings run "sensors -s" as root. After that sensors report for me: via686a-isa-0c00 Adapter: ISA adapter CPU core: +1.58 V (min = +1.40 V, max = +1.60 V) I/O: +3.38 V (min = +3.13 V, max = +3.47 V) +5V: +5.08 V (min = +4.75 V, max = +5.26 V) +12V: +12.18 V (min = +11.39 V, max = +12.61 V) CPU Fan: 1654 RPM (min = 0 RPM, div = 8) CPU Temp: +38.5°C (high = +50°C, hyst = +45°C) SYS Temp: +33.5°C (high = +50°C, hyst = +45°C) -Kastus
Kastus wrote:
To calculate the coefficients compare the values reported by sensors to the ones reported by BIOS.
or, if you are comfortable and have it, use a voltmeter to measure them on the power connector going to the motherboard, and then calculate. -- Joe Morris New Tribes Mission Email Address: Joe_Morris@ntm.org Registered Linux user 231871
Joe, On Saturday 23 April 2005 08:38, Joe Morris (NTM) wrote:
Kastus wrote:
To calculate the coefficients compare the values reported by sensors to the ones reported by BIOS.
or, if you are comfortable and have it, use a voltmeter to measure them on the power connector going to the motherboard, and then calculate. --
Agreed. The voltage and temperature sensor chips are not precision measuring devices. If you have a multimeter (I have one that includes a temperature monitoring function), you can use it to calibrate correction factors accurately enough that you can have confidence in the values reported by user-level software from then on. Be sure to do it again when you replace the motherboard and even periodically (a couple of times a year is probably sufficient) to accommodate potential aging of the analog components used in the sensor chips.
Joe Morris New Tribes Mission Email Address: Joe_Morris@ntm.org Registered Linux user 231871
Randall Schulz
So, if all my temperature sensors indicate the temperature within my system is 55C or below, is it still likely to conclude that heat is the problem? Should I consider replacing parts, or should I just try to reduce the temperature in stead? How can I find out which part is failing because of high temperatures? Regards, Pieter
Pieter Hulshoff wrote:
So, if all my temperature sensors indicate the temperature within my system is 55C or below, is it still likely to conclude that heat is the problem? Should I consider replacing parts, or should I just try to reduce the temperature in stead? How can I find out which part is failing because of high temperatures?
Regards,
Pieter
This is always difficult and can be expensive to solve, I've had a CPU heatsink that wasn't properly seated and the CPU died shortly after. I had lockups on another box, changed the memory, then the HD which got corrupted and finally found it was the IDE port on the motherboard that had died when the HD and cdrom on IDE0 wouldn't work, but were OK on IDE1. On a third box, solid lockups, changed the CPU and later it wouldn't boot up, bad motherboard. Running memtest will give you a good idea if it's bad RAM. Regards Sid. -- Sid Boyce ... Hamradio License G3VBV, Keen licensed Private Pilot Retired IBM Mainframes and Sun Servers Tech Support Specialist Microsoft Windows Free Zone - Linux for all Computing Tasks
On Sunday 24 April 2005 15:20, Sid Boyce wrote:
Pieter Hulshoff wrote:
So, if all my temperature sensors indicate the temperature within my system is 55C or below, is it still likely to conclude that heat is the problem? Should I consider replacing parts, or should I just try to reduce the temperature in stead? How can I find out which part is failing because of high temperatures? This is always difficult and can be expensive to solve, I've had a CPU heatsink that wasn't properly seated and the CPU died shortly after. I had lockups on another box, changed the memory, then the HD which got corrupted and finally found it was the IDE port on the motherboard that had died when the HD and cdrom on IDE0 wouldn't work, but were OK on IDE1. On a third box, solid lockups, changed the CPU and later it wouldn't boot up, bad motherboard. Running memtest will give you a good idea if it's bad RAM.
Well, it's been completely stable now that the temperature in the room has dropped (nice weather outside, so the door's open:). Below 53C on my CPU/37C on my system I don't experience any crashes. Above that it's likely to crash. Memtest didn't show any problems, but the temperature isn't very high during such tests either. Changing my ramtiming to 266 MHz didn't help though, so I'm pretty sure it's not a RAM problem. Any thoughts? Regards, Pieter
Pieter Hulshoff wrote:
Well, it's been completely stable now that the temperature in the room has dropped (nice weather outside, so the door's open:). Below 53C on my CPU/37C on my system I don't experience any crashes. Above that it's likely to crash. Memtest didn't show any problems, but the temperature isn't very high during such tests either. Changing my ramtiming to 266 MHz didn't help though, so I'm pretty sure it's not a RAM problem. Any thoughts?
If you have any reason at all to suspect a problem with excess heating inside the case, then why don't you just get one or two high-throughput fans -- about 50 cuf (1.5 cubic meters) per minute -- and mount it/them on the case, and be done with it? That Lian Li you mentioned sounds like total overkill -- and it probably will cost you several hundred (euros, dollars, or thousand yen :-) ) . Any decent computer case these days should come with two case fan mounts already, and will cost you a lot less. One of those, plus a couple of fans, will save you a bundle. I am not talking about those silly little muffin fans they give you in a power supply and you can buy anywhere for 5 bucks. Those things can't move more than a tiny bit of air (maybe 0.5 cu.m/min if your lucky). If you can't find the fan's throughput printed or stamped on the fan mount, don't buy it (that should be right near the arrow that indicates airflow direction). The fans should be mounted to port air into the case, but make sure you mount a decent air filter in front of each one - otherwise, your case will become a rather large and expensive (and highly inefficient) air filter. No need to get fancy here, you can probably find most of the materials for a decent filter lying around your home -- if necessary, sacrifice a filter for a *good* home air-filtration unit, cut it into small pieces just large enough to cover the intake holes, and tape them to the outside of the case with some electrician's tape. While you're at it, buy/make another filter for the power supply fan intake, particularly if you are a smoker -- fine dust kills electronics nearly as fast as heat, and a power supply makes a much better air filter than a computer case. In a particularly dusty environment, change all the filters every month, otherwise every 3 or 4 months -- but certainly before the dust trapped inside the filter becomes too noticeable, this is expensive stuff you are protecting here. You will of course still need to ensure adequate cooling directly at the CPU. Check, or simply replace, the heat sink material between the CPU chip and the cooling fan. You can get tiny tubes of heatsink material (maybe a cu. cm each?) at any good electronics supply company for a dollar or two for 3 or 4 tubes, which should last almost forever -- half of one should be enough for any CPU. If you have any reason at all to suspect that the CPU fan is even slightly faulty, replace it with one that hopefully moves a higher volume of air. If all this still isn't enough, you need to do something about the room environment. With all the money you just saved not buying the Lian Li case, you can afford a small window-mount air conditioner to keep the temperature and humidity under control :-)
On Monday 25 April 2005 00:47, Darryl Gregorash wrote:
If you have any reason at all to suspect a problem with excess heating inside the case, then why don't you just get one or two high-throughput fans -- about 50 cuf (1.5 cubic meters) per minute -- and mount it/them on the case, and be done with it?
Well, mostly because I can't imagine why a good CPU or any other system component should have any troubles at 55C. The second, though also important reason is that with 7 devices in a midi tower, the wires block much of the airflow. Currently, my system temperature is about 34C, and my CPU temperature is 50C.
That Lian Li you mentioned sounds like total overkill -- and it probably will cost you several hundred (euros, dollars, or thousand yen :-) ) .
The price is about right, but the Lian Li is also because of the amount of disks I got in that system. I've got 5 HDs, a DVD player, and a CD burner in there right now, and my HDs are almost full, so I'm considering adding another 4 HDs. With 9 HDs, I think that Lian Li isn't really overkill anymore. :)
While you're at it, buy/make another filter for the power supply fan intake, particularly if you are a smoker -- fine dust kills electronics nearly as fast as heat, and a power supply makes a much better air filter than a computer case.
Good advice. I'm no smoker, but my wife is. I currently clean out the computer regularly to remove any dust, so some good filtering may be a good idea.
You will of course still need to ensure adequate cooling directly at the CPU. Check, or simply replace, the heat sink material between the CPU chip and the cooling fan.
If a CPU is being used at 100% for a long period of time, what temperatures should it be allowed to reach with reasonable cooling? As I wrote: I don't find 55C to be all that bad, but perhaps my perspective is incorrect. I guess investing in a good CPU fan isn't a problem, but I do want it to be a sensible thing. Please, educate me. :) Regards, Pieter PS: At least I got a reasonable education in ESD from my regular job. ;)
So, with my CPU giving up around 55C, should I consider getting a better CPU cooler, or replace my CPU in stead? If going for the CPU cooler; what would be a good brand/type to get? I don't overclock my CPUs, so no need for a water cooling system. :) Regards, Pieter
participants (10)
-
Andreas Philipp
-
Darryl Gregorash
-
Joe Morris (NTM)
-
Kastus
-
Martin Deppe
-
Pieter Hulshoff
-
Randall R Schulz
-
Sandy Drobic
-
Sid Boyce
-
Stan Glasoe