[opensuse] [OT] unstable system - still trying to identify the culprit.
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up - 1. Gigabyte motherboard, quite new model. 2. AMD Phenom quad-core CPU. 2.2GHz 3. 4Gb memory 4. Software RAID1 on SATA disks. 5. ATI Radeon X1650. 6. 850W powersupply. When I subject the system to stress testing (mprime), it will automatically reboot after a fairly short time. After having made certain it wasn't the memory and not the powersupply, I began to suspect it was cooling/temperature related - I had one temperature reading showing 80-86C. However, Gigabyte said that sensor wasn't in fact connected, so there. I checked with AMD and they confirmed the supplied heastsink and fan were entirely adequate for running the CPU under full load. CPU-temp would hit 60-62C under full load. In the end I reported the board as faulty and sent it back for repair/replacement. I got a new one two days ago, which I've now installed etc. In the meantime memtest86 was upgraded to support the AMD K10 CPU/chipset, so I've done a couple of passes with that - no probs. I've also upgraded the BIOS to the latest. I very carefully cleaned CPU and heatsink and applied a minimal amount of new thermal paste before re-seating everything. And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot. I've kept an eye on the CPU-temp (assuming the readout from lm-sensors is correct), and it's fairly stable in the 53-55C range. So now what? I'm running openSUSE 11.0A2 - maybe I should try 10.3 instead, just in case. Still, I can't help feeling that the motherboard/BIOS is basically buggy, but how do I prove that? /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen a écrit :
So now what?
I've recently seen a thread about ram voltage. Pushing up (for 0.1 or 0.2 V) the ram alim on 4Gb systems could improve stability. Personally, I would be reluctant to do so, so I give you this under all possible warnings... jdd -- http://www.dodin.net http://clairedodin.voices.com/ http://www.clairedodin.com/ http://claire.dodin.net/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
jdd wrote:
Per Jessen a écrit :
So now what?
I've recently seen a thread about ram voltage. Pushing up (for 0.1 or 0.2 V) the ram alim on 4Gb systems could improve stability.
Personally, I would be reluctant to do so, so I give you this under all possible warnings...
Yes, I saw the same thread. It didn't seem quite right to me, but I've just tried it anyway: Increase 0.05V - no change, system fell over. Increase 0.1V - no real change - instead of rebooting, the system just hung after a while. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello,
In the Message;
Subject : [opensuse] [OT] unstable system - still trying to identify the culprit.
Message-ID :
In the meantime memtest86 was upgraded to support the AMD K10 CPU/chipset, so I've done a couple of passes with that - no probs.
A couple of passes? What do you mean? There exist 11 tests in memtest86. Did you run all the test twice? --- Masaru Nomiya mail-to: nomiya @ galaxy.dti.ne.jp "Bill! You married with Computers. Not with Me!" "No..., with money." -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Masaru Nomiya wrote:
Per Jessen
has written: In the meantime memtest86 was upgraded to support the AMD K10 CPU/chipset, so I've done a couple of passes with that - no probs.
A couple of passes? What do you mean?
memtest86+ counts the number of complete passes, i.e. all tests successfully completed. I've done two or three yesterday, and now I'm doing another one or two.
There exist 11 tests in memtest86. Did you run all the test twice?
Three times sofar. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Saturday 01 March 2008 07:47, Per Jessen wrote:
Masaru Nomiya wrote:
...
There exist 11 tests in memtest86. Did you run all the test twice?
Three times sofar.
Of course, memtest86+ (I assume you're running the "plus" variant) is single-threaded, so it produces neither concurrency nor thermal stress (since that's not what it's for).
/Per Jessen, Zürich
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
On Saturday 01 March 2008 07:47, Per Jessen wrote:
Masaru Nomiya wrote:
...
There exist 11 tests in memtest86. Did you run all the test twice?
Three times sofar.
Of course, memtest86+ (I assume you're running the "plus" variant) is single-threaded, so it produces neither concurrency nor thermal stress (since that's not what it's for).
Yes, I'm running the memtest86+ (dated 21/02/08 with K10 support). The previous version fell over when the test reached 25% :-( /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello,
In the Message;
Subject : Re: [opensuse] [OT] unstable system - still trying to identify the culprit.
Message-ID :
In the meantime memtest86 was upgraded to support the AMD K10 CPU/chipset, so I've done a couple of passes with that - no probs.
A couple of passes? What do you mean?
memtest86+ counts the number of complete passes, i.e. all tests successfully completed. I've done two or three yesterday, and now I'm doing another one or two.
There exist 11 tests in memtest86. Did you run all the test twice?
Three times sofar.
OK. As well known, there exits a bug in AMD Phenom. Is your M/B's bios the bug fix version? --- Masaru Nomiya mail-to: nomiya @ galaxy.dti.ne.jp "Bill! You married with Computers. Not with Me!" "No..., with money." -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Masaru Nomiya wrote:
OK. As well known, there exits a bug in AMD Phenom. Is your M/B's bios the bug fix version?
Yes, the BIOS has the bugfix for erratum 298. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Saturday 01 March 2008 06:42, Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
You've posted here before??
...
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot. I've kept an eye on the CPU-temp (assuming the readout from lm-sensors is correct), and it's fairly stable in the 53-55C range.
So now what? I'm running openSUSE 11.0A2 - maybe I should try 10.3 instead, just in case. Still, I can't help feeling that the motherboard/BIOS is basically buggy, but how do I prove that?
Given all this, my next hunch is that there is some kind of concurrency bug, either in the Linux kernel or in the processor itself. I know both are in the realm of "grasping at straws," but it begins to seem more likely this is a design defect, be it in the software or some portion of the hardware (or its microcode). There's one thermal test (at least) you could try, which is to artifically / externally impose some extra cooling while the test runs. I don't know if a sustained stream from a cooling spray is practical over the duration usually required to experience the failure. Perhaps you could run the system in a walk-in freezer with the case open and extra fans directed at the CPU and chipset portion of the mainboard? Perhaps if you could get Gigabyte to replicate your results they'd take it up as an engineering issue on their end?
/Per Jessen, Zürich
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
On Saturday 01 March 2008 06:42, Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
You've posted here before??
Only once or twice - you could easily have missed it :-)
Given all this, my next hunch is that there is some kind of concurrency bug, either in the Linux kernel or in the processor itself. I know both are in the realm of "grasping at straws," but it begins to seem more likely this is a design defect, be it in the software or some portion of the hardware (or its microcode).
I'm grasping at the very same straws. I did run the system with a serial console hooked up to see if I could catch anything that way, but I saw no output.
There's one thermal test (at least) you could try, which is to artifically / externally impose some extra cooling while the test runs. I don't know if a sustained stream from a cooling spray is practical over the duration usually required to experience the failure. Perhaps you could run the system in a walk-in freezer with the case open and extra fans directed at the CPU and chipset portion of the mainboard?
I did in fact run the system outdoor two weeks ago when we had about 0 degrees - it wasn't a very conclusive test. The system did hold up for longer, but eventually still fell over. I also tried your second suggestion with a large fan directed at the CPU and chipset - didn't change a thing.
Perhaps if you could get Gigabyte to replicate your results they'd take it up as an engineering issue on their end?
That's what I'm hoping for. I've got an open report with Gigabyte, but as it's a weekend, I probably won't see a response until Monday or Tuesday. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
Vividly, Per, I have solved the Gigabyte GA-n700sl instability. I routinely memtest86+ the memory when I do a new install. I was running 2G of ocz Platinum memory that I tested 6 months ago and it was clean. After replacing everything in the system to try and resolve the issue (hard drive had to be replaced - unrecoverable disk error) I have memtest again, just to make sure. Both sticks were reporting errors. Memory was probably 2 years old, but I called ocz, they issued an RMA, and the RAM is being replaced at no cost. In the interim, I purchased 2 more gig of the exact same ocz memory and memtested it before booting. The new RAM was clean confirming the old had indeed failed. That was the 1st time in 17 years that I had ever had ram just "go bad". So double check the ram with memtest86+ -- David C. Rankin, J.D., P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
[pruned]
CPU-temp would hit 60-62C under full load. [pruned]
I very carefully cleaned CPU and heatsink and applied a minimal amount of new thermal paste before re-seating everything.
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot. I've kept an eye on the CPU-temp (assuming the readout from lm-sensors is correct), and it's fairly stable in the 53-55C range.
Well you certainly achieved a substantial drop in CPU temperature! A question: do you always run the same software in the test? mprime is used to stress-test a computer so why run the other apps. like firefox and such. Reason for asking is that you may be running out of RAM because some app other than mprime has a problem with handing back RAM - firefox prior to v3 has this problem I believe. I don't know how these things work but to me it seems that the only thing not spoken about to date is the software you are using to run these tests so perhaps the problem lies there (when the "goodies" and the "baddies" are all running together)? :-) Ciao. -- I was very heavily into pornography. Then my pornograph broke. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Basil Chupin wrote:
A question: do you always run the same software in the test? mprime is used to stress-test a computer so why run the other apps. like firefox and such.
I might just be browsing something or other, or setting something up whatever. No particular purpose.
Reason for asking is that you may be running out of RAM because some app other than mprime has a problem with handing back RAM - firefox prior to v3 has this problem I believe.
I would expect to see the kernel complain or hang or something, not just reboot. Or it might kill off whatever is using up the RAM. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sunday 02 March 2008 02:52:58 am Per Jessen wrote:
Basil Chupin wrote:
A question: do you always run the same software in the test? mprime is used to stress-test a computer so why run the other apps. like firefox and such.
I might just be browsing something or other, or setting something up whatever. No particular purpose.
Reason for asking is that you may be running out of RAM because some app other than mprime has a problem with handing back RAM - firefox prior to v3 has this problem I believe.
I would expect to see the kernel complain or hang or something, not just reboot. Or it might kill off whatever is using up the RAM.
Have you tried before to run computer without mprime? If you want to stress test hardware use something else, compile whole KDE for instance. Simply exclude mprime as a suspect. -- Regards, Rajko. See http://en.opensuse.org/Portal -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Rajko M. wrote:
Have you tried before to run computer without mprime? If you want to stress test hardware use something else, compile whole KDE for instance.
Simply exclude mprime as a suspect.
But, mprime is not a suspect at all. mprime is a well known stress test. I can certainly try something else too, but what will it tell me? /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sunday 02 March 2008 04:44:35 am Per Jessen wrote:
Rajko M. wrote:
Have you tried before to run computer without mprime? If you want to stress test hardware use something else, compile whole KDE for instance.
Simply exclude mprime as a suspect.
But, mprime is not a suspect at all. mprime is a well known stress test. I can certainly try something else too, but what will it tell me?
Reading all attempts to test the board with mprime I can't understand what you are trying to archive. Are you looking for motherboard that will survive 100% usage of basic computer components for long periods of time (hours, days, more), or you just want to have dependable board for normal usage that will not break in the middle of the compilation. What you have by now: - well known program, - compiled for Factory, - on new motherboard. - you are using this combination for a week or two. - test in cold air tells that problem has something to do with overheating - increasing memory voltage to 2.6 V has some success You have to take out some factors from equation to find solution. To concentrate on hardware alone you have to exclude Factory and mprime as contributing factors to failure. Installation lasts longer than simply run something else, like kernel compilation. You may want to go higher with memory voltage, if possible on that MB, but check with RAM manufacturer is that OK. Modern boards have their ideas about CPU voltage and BIOS doesn't give chance to change that, and that could be another hardware culprit. -- Regards, Rajko. See http://en.opensuse.org/Portal -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Rajko M. wrote:
Reading all attempts to test the board with mprime I can't understand what you are trying to archive.
It's a perfectly normal thing to do - in industry it's often also referred to as a burn-in test. I am simply trying to verify that the CPU+board+memory+cooling+powersupply constellation is sound, and that nothing breaks when it is stressed. It is the only way to ensure the entire system will also produce correct and dependable results during normal operation. So far I've determined that the setup is _not_ dependable under load, which implies it cannot be depended 100% even whilst under less load.
Are you looking for motherboard that will survive 100% usage of basic computer components for long periods of time (hours, days, more), or you just want to have dependable board for normal usage that will not break in the middle of the compilation.
I'd prefer both really, but I can accept that the hardware components on this board may not last as long as those on a board made for use in a server.
What you have by now: - well known program, - compiled for Factory, - on new motherboard. - you are using this combination for a week or two. - test in cold air tells that problem has something to do with overheating - increasing memory voltage to 2.6 V has some success
You have to take out some factors from equation to find solution.
Yes, and that is exactly what I've done. I replaced the powersupply. I checked each of the memory stick individually and in pairs. I replaced the motherboard. I've taken extra special care when I seated the CPU+heatsink+fan. I have avoided replaing the CPU as I don't really think the problem is there, plus it's somewhat costly.
To concentrate on hardware alone you have to exclude Factory and mprime as contributing factors to failure. Installation lasts longer than simply run something else, like kernel compilation.
I still don't understand what it will do, Rajko - running mprime is a well established stresstest. Doing a compilation run of <anything> is also a stresstest, but it won't add to or subtract from the problem. That's why I don't understand your suggestion. Not using mprime is like saying "hmm, mprime breaks it, so if I dont use it, it won't break".
You may want to go higher with memory voltage, if possible on that MB, but check with RAM manufacturer is that OK.
Yep, I did that and I checked. I tried +0.05V and +0.1V which is just within the manufacturer specs of 1.8V+-0.1V. Didn't change much though. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
To concentrate on hardware alone you have to exclude Factory and mprime as contributing factors to failure. Installation lasts longer than simply run something else, like kernel compilation.
I still don't understand what it will do, Rajko - running mprime is a well established stresstest. Doing a compilation run of <anything> is also a stresstest, but it won't add to or subtract from the problem. That's why I don't understand your suggestion.
Well, I'm just now running a parallel kernel compile. So far it's doing fine, and has in fact driven the CPU temp higher than mprime was able (before crashing). I think that means I can conclude it's _not_ a thermal problem. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sunday 02 March 2008 11:05:05 am Per Jessen wrote:
Per Jessen wrote:
To concentrate on hardware alone you have to exclude Factory and mprime as contributing factors to failure. Installation lasts longer than simply run something else, like kernel compilation.
I still don't understand what it will do, Rajko - running mprime is a well established stresstest. Doing a compilation run of <anything> is also a stresstest, but it won't add to or subtract from the problem. That's why I don't understand your suggestion.
Well, I'm just now running a parallel kernel compile. So far it's doing fine, and has in fact driven the CPU temp higher than mprime was able (before crashing). I think that means I can conclude it's _not_ a thermal problem.
Thanks :-) The idea was simple troubleshooting technique to replace every part of non working system that you don't understand completely how it works. In your case you replaced or checked all except CPU, Factory and mprime, so it was time to do that. The mprime is the easiest to replace, just run something else, and that is why it was first on the list. Now you can run kernel compilation from script, that will compile, than 'make mrproper' and than again compile for hours to see how it works in long run, ie. how it handles real thermal stress, but at least you know that mprime was at least part of the problem, if not the only problem. I agree with you that thermal problem should be put on rest for now. Jim metioned interaction between GUI and mprime as possible reason. It can be as mprime is console application. It can be that you found bug in either gcc, kernel, glibc, mprime or Phenom. I'll see to compile mprime and test here and post result on: X-Mailinglist: opensuse-testing List-Post: mailto:opensuse-testing@opensuse.org List-Help: mailto:opensuse-testing+help@opensuse.org List-Subscribe: mailto:opensuse-testing+subscribe@opensuse.org List-Unsubscribe: mailto:opensuse-testing+unsubscribe@opensuse.org List-Owner: mailto:opensuse-testing+owner@opensuse.org Note that it is different list than opensuse-test. -- Regards, Rajko. See http://en.opensuse.org/Portal -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Rajko M. wrote:
The idea was simple troubleshooting technique to replace every part of non working system that you don't understand completely how it works. In your case you replaced or checked all except CPU, Factory and mprime, so it was time to do that.
And I had in fact also just today replaced my initial 11.0A2 with a 10.3 install, so now we're left with the CPU.
Now you can run kernel compilation from script, that will compile, than 'make mrproper' and than again compile for hours to see how it works in long run, ie. how it handles real thermal stress, but at least you know that mprime was at least part of the problem, if not the only problem. I agree with you that thermal problem should be put on rest for now.
I'll set up the machine to run kernel compiles all night, it's already running in fact. I disagree about mprime being _part_ of the problem though. mprime is what _provokes_ the problem, but it's not part of it. What you're saying is the same as saying that a car is part of the problem if a bridge collapses.
Jim metioned interaction between GUI and mprime as possible reason. It can be as mprime is console application. It can be that you found bug in either gcc, kernel, glibc, mprime
OK, I agree, that is marginally possible. But without _any_ error message of any kind at all?
or Phenom.
I've queried AMD about that one. I've asked them for a diagnostic tool.
I'll see to compile mprime and test here and post result on:
X-Mailinglist: opensuse-testing List-Post: mailto:opensuse-testing@opensuse.org List-Help: mailto:opensuse-testing+help@opensuse.org List-Subscribe: mailto:opensuse-testing+subscribe@opensuse.org List-Unsubscribe: mailto:opensuse-testing+unsubscribe@opensuse.org List-Owner: mailto:opensuse-testing+owner@opensuse.org
Note that it is different list than opensuse-test.
I didn't know either of them, but I do now. Thanks Rajko. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
Now you can run kernel compilation from script, that will compile, than 'make mrproper' and than again compile for hours to see how it works in long run, ie. how it handles real thermal stress, but at least you know that mprime was at least part of the problem, if not the only problem. I agree with you that thermal problem should be put on rest for now.
I'll set up the machine to run kernel compiles all night, it's already running in fact.
Well. That didn't take long. About 35minutes. Crash. I was running the following: cd /usr/src/linux while true do make mrproper zcat /proc/config.gz >.config make -j6 done plus a little homegrown memory-exerciser that really just exercises memory in a straight forward way. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sunday 02 March 2008 01:58:00 pm Per Jessen wrote:
Per Jessen wrote: ... Well. That didn't take long. About 35minutes. Crash.
I was running the following:
cd /usr/src/linux while true do make mrproper zcat /proc/config.gz >.config make -j6 done
plus a little homegrown memory-exerciser that really just exercises memory in a straight forward way.
Blind troubleshooting is what you do. Change one thing at the time and minimize number of components, than when one test passes add more. What happens with swap? Is it used when thing crashes? Memtest doesn't use swap, it keeps CPU busy, but it never crashes. Make script that will send or save status of used resources to another computer every few seconds. Phenom? Have you any other processor that you can put in that board? It has one bug mentioned in this thread, how many is not known; kernel support in 10.3 is what: good, bad, existant, nonexistant ? -- Regards, Rajko. See http://en.opensuse.org/Portal -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Rajko M. wrote:
What happens with swap? Is it used when thing crashes?
Nope, a parallel compile doesn't use that much memory at all. Maybe a large C++ compile.
Phenom? Have you any other processor that you can put in that board?
Nope.
It has one bug mentioned in this thread, how many is not known; kernel support in 10.3 is what: good, bad, existant, nonexistant ?
Good AFAIK. It's just an AMD64 CPU times 4. That's all. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sunday 02 March 2008 20:58:00 Per Jessen wrote:
Per Jessen wrote:
Now you can run kernel compilation from script, that will compile, than 'make mrproper' and than again compile for hours to see how it works in long run, ie. how it handles real thermal stress, but at least you know that mprime was at least part of the problem, if not the only problem. I agree with you that thermal problem should be put on rest for now.
I'll set up the machine to run kernel compiles all night, it's already running in fact.
Well. That didn't take long. About 35minutes. Crash.
Out of curiosity, do you ever get any log messages from the crashes? Have you considered configuring the system to give you a crashdump? Anders -- Madness takes its toll -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Anders Johansson wrote:
Out of curiosity, do you ever get any log messages from the crashes?
No, nothing. And I have the system running with a serial console to be able to capture any kernel output.
Have you considered configuring the system to give you a crashdump?
When would it be triggered? I haven't done it, no. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sunday 02 March 2008 09:58:00 am Per Jessen wrote:
Per Jessen wrote:
Now you can run kernel compilation from script, that will compile, than 'make mrproper' and than again compile for hours to see how it works in long run, ie. how it handles real thermal stress, but at least you know that mprime was at least part of the problem, if not the only problem. I agree with you that thermal problem should be put on rest for now.
I'll set up the machine to run kernel compiles all night, it's already running in fact.
Well. That didn't take long. About 35minutes. Crash.
I was running the following:
cd /usr/src/linux while true do make mrproper zcat /proc/config.gz >.config make -j6 done
plus a little homegrown memory-exerciser that really just exercises memory in a straight forward way.
/Per Jessen, Zürich
so far you have changed power supply, mobo and perhaps the cpu fan. most probably the replacement components are not at fault. the major components that have still not been physically replaced are: cpu, ram, disk, disk cables, os. You need to try an alternate on each item. If you have a spare hard drive, wipe it out, get a new cable and install windoze without any RAID software. If you meet with success, try 10.3 without the RAID, then with the RAID. If you still have crashes with new disk, cables and os, try a new memory stick. If that fails as well, you will have enough documentation to argue for a free cpu replacement. If you still have a problem, call an exorcist. and please keep us posted. good luck, d. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
Well. That didn't take long. About 35minutes. Crash.
I was running the following:
cd /usr/src/linux while true do make mrproper zcat /proc/config.gz >.config make -j6 done
OTOH, I left it running all night, and it kept going until now, which is about 14hours. No probs. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Monday 03 March 2008 04:09:45 am Per Jessen wrote:
Per Jessen wrote:
Well. That didn't take long. About 35minutes. Crash.
I was running the following:
cd /usr/src/linux while true do make mrproper zcat /proc/config.gz >.config make -j6 done
OTOH, I left it running all night, and it kept going until now, which is about 14hours. No probs.
You mean that it first crashed after 35 minutes, and than run all night. What did you change? BTW, I left mprime binary running in KDE konsole for 2 hours and it warmed up CPU to 65 C, but it didn't found any problems. I also tried to compile from sources, but compilation failed. Some rule in makefile was not defined. I was actually looking to compile first as is, and later to change machine to AMD 64 and see how it works. Default mach is i486. -- Regards, Rajko. See http://en.opensuse.org/Portal -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Saturday 01 March 2008 15:42:46 Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
1. Gigabyte motherboard, quite new model. 2. AMD Phenom quad-core CPU. 2.2GHz 3. 4Gb memory 4. Software RAID1 on SATA disks. 5. ATI Radeon X1650. 6. 850W powersupply.
When I subject the system to stress testing (mprime), it will automatically reboot after a fairly short time.
Does this system have some sort of watchdog, that monitors for kernel hangs? Anders -- Madness takes its toll -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Anders Johansson wrote:
On Saturday 01 March 2008 15:42:46 Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
1. Gigabyte motherboard, quite new model. 2. AMD Phenom quad-core CPU. 2.2GHz 3. 4Gb memory 4. Software RAID1 on SATA disks. 5. ATI Radeon X1650. 6. 850W powersupply.
When I subject the system to stress testing (mprime), it will automatically reboot after a fairly short time.
Does this system have some sort of watchdog, that monitors for kernel hangs?
You mean like a plugin card or something like that? Nope, it's a fairly bog standard system. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sunday 02 March 2008 12:25:54 Per Jessen wrote:
Anders Johansson wrote:
On Saturday 01 March 2008 15:42:46 Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
1. Gigabyte motherboard, quite new model. 2. AMD Phenom quad-core CPU. 2.2GHz 3. 4Gb memory 4. Software RAID1 on SATA disks. 5. ATI Radeon X1650. 6. 850W powersupply.
When I subject the system to stress testing (mprime), it will automatically reboot after a fairly short time.
Does this system have some sort of watchdog, that monitors for kernel hangs?
You mean like a plugin card or something like that?
Well, not necessarily. A watchdog can be integrated on the motherboard Anders -- Madness takes its toll -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Anders Johansson wrote:
Does this system have some sort of watchdog, that monitors for kernel hangs?
You mean like a plugin card or something like that?
Well, not necessarily. A watchdog can be integrated on the motherboard
Ah, I see - no it doesn't have anything like that. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
Have you monitored memory, swap, cpu and disk load while the system is stressed? Does the system start "thrashing"? Are you using 4K or 8K stacks (or is this a 32-bit system?). What file system are you using? I had a problem on a 32-bit system where I was using 4K kernel stacks. Something had changed in some driver somewhere, over time, and apparently was causing occasional corruption when I was doing heavy file I/O -- backups to a hard disk from other computers running at same time local maintenance tasks were running... symptom was the kernel would just "hang" (no messages, no hints). Switching to an 8K kernel resolved the problem... How about trying a non-SMP kernel? If you load down "1-core", can it still crash? may be limit maxcpus to 1 and try testing (same kernel, only 1 core), but also try a UP compiled kernel if the same kernel + 1 core fails. You could also try limiting the max memory to the 1st 1GB and see if that changes how "fast" or how "often" it crashes. Have you tried it with half the memory (or can you?) -- not that I'm suspecting the memory, but sometimes probs happen in 4GB that won't happen in 2GB (MS disabled top 1GB of address space in XP to prevent faulty driver problems; I know you aren't running WinXP, but...same idea "could" hold...). At this point, you are starting with a fresh system that hasn't been "proven", so it _could_ be virtually anything. How about losing the "RAID"...can you try a PATA disk? Like do you have a spare you could do a test install and boot off of? On the above mentioned 32-bit system, another confounding factor was the addition of a SATA controller & drive. Going back to a pure PATA system changed the "frequency" of my crashes to be limited to the early AM when all the backup jobs and maintenance jobs ran. Still haven't added back the SATA (am a bit "afraid"...it's working, and don't want to break it again...but I know that's a partly "lame" excuse...:-))... Have you disabled all "extra" hardware possible? I don't know AMD's too well -- do they have multi-threading? In debugging my crash, I also removed an add-in USB controller & a separate firewire disk. Does your CPU use any frequency scaling? Can that be disabled? What kernels have you tried? -linda... -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
1. Gigabyte motherboard, quite new model. 2. AMD Phenom quad-core CPU. 2.2GHz 3. 4Gb memory 4. Software RAID1 on SATA disks. 5. ATI Radeon X1650. 6. 850W powersupply.
I don't remember the full thread from previous, but have you tried running your tests in runlevel 3? That would eliminate any X issues as a possibility. Jim F -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
[pruned]
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot.
[pruned] Out of (personal) curiosity, why do you run 2 copies of mprime for these tests? Wouldn't the one copy do? Ciao. -- I was very heavily into pornography. Then my pornograph broke. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Monday 03 March 2008 17:48, Basil Chupin wrote:
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
[pruned]
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot.
[pruned]
Out of (personal) curiosity, why do you run 2 copies of mprime for these tests? Wouldn't the one copy do?
Presumably, one wants to run one instance per core or per processor (or per core per multi-core processor) to fully saturate the processing capacity of your system.
Ciao.
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
On Monday 03 March 2008 17:48, Basil Chupin wrote:
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
[pruned]
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot.
[pruned]
Out of (personal) curiosity, why do you run 2 copies of mprime for these tests? Wouldn't the one copy do?
Presumably, one wants to run one instance per core or per processor (or per core per multi-core processor) to fully saturate the processing capacity of your system.
In which case shouldn't Per be running 4 copies of mprime (as he has a quad-core CPU)? Ciao. -- I was very heavily into pornography. Then my pornograph broke. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Basil Chupin wrote:
Randall R Schulz wrote:
Presumably, one wants to run one instance per core or per processor (or per core per multi-core processor) to fully saturate the processing capacity of your system.
In which case shouldn't Per be running 4 copies of mprime (as he has a quad-core CPU)?
I should and I've tried it, but the machine just dies right away. Just tried it now, and it took a full 22 seconds before it died. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen a écrit :
I should and I've tried it, but the machine just dies right away. Just tried it now, and it took a full 22 seconds before it died.
may be one of the cores is dead? is there a way to enforce the use of one only core (selected)? jdd -- http://www.dodin.net http://clairedodin.voices.com/ http://www.clairedodin.com/ http://claire.dodin.net/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 04 March 2008 01:05, jdd wrote:
Per Jessen a écrit :
I should and I've tried it, but the machine just dies right away. Just tried it now, and it took a full 22 seconds before it died.
may be one of the cores is dead? is there a way to enforce the use of one only core (selected)?
% man taskset NAME taskset - retrieve or set a processes's CPU affinity SYNOPSIS taskset [options] [mask | list ] [pid | command [arg]...] DESCRIPTION taskset is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity. ...
jdd
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Monday 03 March 2008 07:48:23 pm Basil Chupin wrote:
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
[pruned]
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot.
[pruned]
Out of (personal) curiosity, why do you run 2 copies of mprime for these tests? Wouldn't the one copy do?
From readme.txt (mprime2414.tar.gz) : To fully utilize a dual-CPU machine, you must run two copies of mprime. Run one copy of mprime as described above. Run the second copy of mprime with the -A1 switch. Make sure your startup scripts start both executables. -An This is used to run two or more copies of mprime from the same directory. Using this command line argument causes mprime to use a different set of filenames for the INI files, the results file, the log file, and the spool file. Just use a different value of n for each extra copy of mprime you start. -- Regards, Rajko. See http://en.opensuse.org/Portal -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Monday 03 March 2008 07:48:23 pm Basil Chupin wrote:
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
[pruned]
I read your summary. All I can suggest is to look for 'lost ticks' in /var/log/messages and the boot log. There is a kernel boot parameter you must nominate to report these, there is also a mcelog parameter
On Tue, Mar 4, 2008 at 1:05 PM, Rajko M.
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot.
[pruned]
Out of (personal) curiosity, why do you run 2 copies of mprime for these tests? Wouldn't the one copy do?
From readme.txt (mprime2414.tar.gz) :
To fully utilize a dual-CPU machine, you must run two copies of mprime. Run one copy of mprime as described above. Run the second copy of mprime with the -A1 switch. Make sure your startup scripts start both executables.
-An This is used to run two or more copies of mprime from the same directory. Using this command line argument causes mprime to use a different set of filenames for the INI files, the results file, the log file, and the spool file. Just use a different value of n for each extra copy of mprime you start.
-- Regards, Rajko. See http://en.opensuse.org/Portal
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Mark Van De Vyver wrote:
I read your summary. All I can suggest is to look for 'lost ticks' in /var/log/messages and the boot log. There is a kernel boot parameter you must nominate to report these, there is also a mcelog parameter that might give some info. Can't recall the exact parameters off the top of my head, but if you haven't already tried these it might be worth a shot?
I'll give it a go - thanks. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Mark Van De Vyver wrote:
I read your summary. All I can suggest is to look for 'lost ticks' in /var/log/messages and the boot log. There is a kernel boot parameter you must nominate to report these, there is also a mcelog parameter that might give some info. Can't recall the exact parameters off the top of my head, but if you haven't already tried these it might be worth a shot?
I did some googling for 'lost ticks', but it looks like they're reported by default? The mcelog was only being emptied hourly, so I change that to once every minute instead. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tue, Mar 4, 2008 at 8:10 PM, Per Jessen
Mark Van De Vyver wrote:
I read your summary. All I can suggest is to look for 'lost ticks' in /var/log/messages and the boot log. There is a kernel boot parameter you must nominate to report these, there is also a mcelog parameter that might give some info. Can't recall the exact parameters off the top of my head, but if you haven't already tried these it might be worth a shot?
I did some googling for 'lost ticks', but it looks like they're reported by default? The mcelog was only being emptied hourly, so I change that to once every minute instead.
lucky... I had my server at hand just now. This is what I'd used before to try and sort out a flaky server (I really doubt it is the same problem) working though the combinations might tell you some thing?: report_lost_ticks (this wasn't the default when I used it on openSUSE10.2) mce=bootlog apic=debug (then off) apm=off maxcpus=1 (then 2,3,4) HTH?
/Per Jessen, Zürich
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Basil Chupin wrote:
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
[pruned]
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot.
[pruned]
Out of (personal) curiosity, why do you run 2 copies of mprime for these tests? Wouldn't the one copy do?
To keep both cores occupied. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Basil Chupin wrote:
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
[pruned]
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot.
[pruned]
Out of (personal) curiosity, why do you run 2 copies of mprime for these tests? Wouldn't the one copy do?
It's a quad-core machine, so it should be possible to run four copies of mprime concurrently without it falling over. Latest update - it was suggested I install 10.3, then upgrade to the latest kernel. Which has very surprisingly had the effect that I can now run one copy of mprime a lot longer than before. I had one copy running almost eight hours last night. I asked AMD for a CPU diagnostic tool, but they can only supply something for Windows. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
Basil Chupin wrote:
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
[pruned]
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot.
[pruned]
Out of (personal) curiosity, why do you run 2 copies of mprime for these tests? Wouldn't the one copy do?
It's a quad-core machine, so it should be possible to run four copies of mprime concurrently without it falling over.
My earlier question to Randall was made out of the need to know to what he stated. So, it seems that the "logical" answer is to run a copy of mprime for each core available in the CPU. However, while this is seems to be a "logical" occlusion, is it really a practical or practicable thing to do? What exactly is the purpose of running all these copies of mprime? What exactly will it prove if the CPU can run 2 or 3 or 4 copies of mprime? If the CPU can handle 1 copy of mprime - and it seems that it can from what you say below - isn't that enough to show that the CPU and the hardware it is connected with is capable of working without falling over under normal usage? I mean, what sort of stress do you expect to put your server in its life time? Or is this stress test that you are subjecting your new mobo etc just a matter of finding out WHY the damn thing HAS fallen over when it had to run at least a couple of copies of mprime plus whatever it was you ran at the same time? Will your machine ever have to be put under such a stress?
Latest update - it was suggested I install 10.3, then upgrade to the latest kernel. Which has very surprisingly had the effect that I can now run one copy of mprime a lot longer than before. I had one copy running almost eight hours last night.
Well, there you are. Didn't fall over for ~8 hours - a lot better than ~20 minutes :-) . Use your new system to do something productive rather keep running mprime :-D .
I asked AMD for a CPU diagnostic tool, but they can only supply something for Windows.
<sigh> That'll be right :-( . Ciao. -- A fanatic is one who can't change his mind and won't change the subject. Sir Winston Churchill -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 04 March 2008 02:21, Basil Chupin wrote:
...
What exactly is the purpose of running all these copies of mprime? What exactly will it prove if the CPU can run 2 or 3 or 4 copies of mprime?
It's a stress tester. You want to exercise all the hardware and force contention at the OS and the CPU microcode level (not all hardware is replicated fully on multicore chips, so only though contention for, say, address resolution logic, will you really fully test the machines ability to sustain full loads without incurring errors.
If the CPU can handle 1 copy of mprime - and it seems that it can from what you say below - isn't that enough to show that the CPU and the hardware it is connected with is capable of working without falling over under normal usage?
Not at all. Many contingencies that the OS and software and microcode must accommodate will never occur when only one core is in operation.
I mean, what sort of stress do you expect to put your server in its life time? Or is this stress test that you are subjecting your new mobo etc just a matter of finding out WHY the damn thing HAS fallen over when it had to run at least a couple of copies of mprime plus whatever it was you ran at the same time? Will your machine ever have to be put under such a stress?
The question is what failure rate does the machine exhibit. One failure per 10^12 instructions sounds infinitesimal, but that's only about a thousand seconds of operation! We need to know that our machines are _extremely_ unlikely to fail, and the only way to do that is to push them to their rated limits under the expectation that those limits are not overstated.
Latest update - it was suggested I install 10.3, then upgrade to the latest kernel. Which has very surprisingly had the effect that I can now run one copy of mprime a lot longer than before. I had one copy running almost eight hours last night.
Well, there you are. Didn't fall over for ~8 hours - a lot better than ~20 minutes :-) .
May I suggest a nice Windows installation for you? They mostly work.
...
Ciao.
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
On Tuesday 04 March 2008 02:21, Basil Chupin wrote:
...
What exactly is the purpose of running all these copies of mprime? What exactly will it prove if the CPU can run 2 or 3 or 4 copies of mprime?
[pruned]
I mean, what sort of stress do you expect to put your server in its
life time? Or is this stress test that you are subjecting your new mobo etc just a matter of finding out WHY the damn thing HAS fallen over when it had to run at least a couple of copies of mprime plus whatever it was you ran at the same time? Will your machine ever have to be put under such a stress?
The question is what failure rate does the machine exhibit. One failure per 10^12 instructions sounds infinitesimal, but that's only about a thousand seconds of operation!
We need to know that our machines are _extremely_ unlikely to fail, and the only way to do that is to push them to their rated limits under the expectation that those limits are not overstated.
Thanks for the explanation, Randall. When I eventually get a quad-core I'll keep all this in mind.
Latest update - it was suggested I install 10.3, then upgrade to the latest kernel. Which has very surprisingly had the effect that I can now run one copy of mprime a lot longer than before. I had one copy running almost eight hours last night.
Well, there you are. Didn't fall over for ~8 hours - a lot better than ~20 minutes :-) .
May I suggest a nice Windows installation for you? They mostly work.
For me? No thanks, I am sticking with SUSE. Ciao. -- A fanatic is one who can't change his mind and won't change the subject. Sir Winston Churchill -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Basil Chupin wrote:
We need to know that our machines are _extremely_ unlikely to fail, and the only way to do that is to push them to their rated limits under the expectation that those limits are not overstated.
Thanks for the explanation, Randall. When I eventually get a quad-core I'll keep all this in mind.
I think Randall put it very well and to the point - but be aware that this is in no way related to the CPU being quad-core. I have put every new system I have ever built (or acquired) through a thorough stress test. Wrt mprime, the stress test facility is there to ascertain the reliabilityof a machine as the regular outcome of mprime feeds in to a bigger project where reliability and accuracy are essential for the end result. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello all, I'm trying to mount a directory from a windows share by using this command: # mount -t cifs -o username=someuser,password=somepass,dom=domname //winserver/sharename /mnt The mount works very well. But the umount takes maybe 10 seconds and after that I can see the following in the messages file: Mar 6 13:25:31 prinz kernel: CIFS VFS: RFC1001 size 35 bigger than SMB for Mid=9 Mar 6 13:25:31 prinz kernel: Bad SMB: : dump of 48 bytes of data at 0xea3f7a80 Mar 6 13:25:31 prinz kernel: 00000023 424d53ff 00000074 00018800 # . . . S M B t . . . . . . . Mar 6 13:25:31 prinz kernel: 00000000 00000000 00000000 15670000 . . . . . . . . . . . . . . g . Mar 6 13:25:31 prinz kernel: 00090000 0000ff00 00000000 00000000 . . . . . . . . . . . . . . . Mar 6 13:25:51 prinz kernel: CIFS VFS: server not responding Mar 6 13:25:51 prinz kernel: CIFS VFS: No response to cmd 116 mid 9 What is that 'size 35 bigger than SMB for Mid=9'? Any ideas what to do with this? Thanks Wolfgang ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
Basil Chupin wrote:
Per Jessen wrote:
You will undoubtedly remember the discussion about the stability problems on my new workstation from a couple of weeks ago. I'll quickly sum up -
[pruned]
And the machine still crashes under load - even just a little bit. I run two copies of mprime, plus firefox and such, and after 15-20 minutes, I get the automatic reboot.
[pruned]
Out of (personal) curiosity, why do you run 2 copies of mprime for these tests? Wouldn't the one copy do?
It's a quad-core machine, so it should be possible to run four copies of mprime concurrently without it falling over.
Latest update - it was suggested I install 10.3, then upgrade to the latest kernel. Which has very surprisingly had the effect that I can now run one copy of mprime a lot longer than before. I had one copy running almost eight hours last night.
I asked AMD for a CPU diagnostic tool, but they can only supply something for Windows.
There does exist a version of windows xp that runs off a boot cd or dvd. It's not made by MS, but rather some guy. I think he goes by Bart? If you had that you could boot off that optical disk to run AMD's tests. Jim F -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 04 March 2008 00:33, Per Jessen wrote:
...
It's a quad-core machine, so it should be possible to run four copies of mprime concurrently without it falling over.
Correction: It should be possible to run as many copies as available RAM + swap can accommodate!
Latest update - it was suggested I install 10.3, then upgrade to the latest kernel. Which has very surprisingly had the effect that I can now run one copy of mprime a lot longer than before. I had one copy running almost eight hours last night.
We're entering the era of concurrency bugs because of the widespread availability of multi-core processors. But most people thought it was going to be application code that exhibited those bugs and that the very high standards of kernel programming would ensure that deadlocks or other concurrency errors would be rare and quickly fixed in operating system kernels and nuclei.
I asked AMD for a CPU diagnostic tool, but they can only supply something for Windows.
Dastards!
/Per Jessen, Zürich
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (14)
-
Aaron Kulkis
-
Anders Johansson
-
Basil Chupin
-
David C. Rankin
-
jdd
-
Jim Flanagan
-
kanenas@hawaii.rr.com
-
Linda Walsh
-
Mair Wolfgang-awm013
-
Mark Van De Vyver
-
Masaru Nomiya
-
Per Jessen
-
Rajko M.
-
Randall R Schulz