Mailinglist Archive: opensuse (2459 mails)

< Previous Next >
Re: [opensuse] [OT] unstable system - still trying to identify the culprit.
  • From: Randall R Schulz <rschulz@xxxxxxxxx>
  • Date: Tue, 4 Mar 2008 07:21:17 -0800
  • Message-id: <200803040721.18083.rschulz@xxxxxxxxx>
On Tuesday 04 March 2008 02:21, Basil Chupin wrote:
...

What exactly is the purpose of running all these copies of mprime?
What exactly will it prove if the CPU can run 2 or 3 or 4 copies of
mprime?

It's a stress tester. You want to exercise all the hardware and force
contention at the OS and the CPU microcode level (not all hardware is
replicated fully on multicore chips, so only though contention for,
say, address resolution logic, will you really fully test the machines
ability to sustain full loads without incurring errors.


If the CPU can handle 1 copy of mprime - and it seems that it can
from what you say below - isn't that enough to show that the CPU and
the hardware it is connected with is capable of working without
falling over under normal usage?

Not at all. Many contingencies that the OS and software and microcode
must accommodate will never occur when only one core is in operation.


I mean, what sort of stress do you expect to put your server in its
life time? Or is this stress test that you are subjecting your new
mobo etc just a matter of finding out WHY the damn thing HAS fallen
over when it had to run at least a couple of copies of mprime plus
whatever it was you ran at the same time? Will your machine ever have
to be put under such a stress?

The question is what failure rate does the machine exhibit. One failure
per 10^12 instructions sounds infinitesimal, but that's only about a
thousand seconds of operation!

We need to know that our machines are _extremely_ unlikely to fail, and
the only way to do that is to push them to their rated limits under the
expectation that those limits are not overstated.


Latest update - it was suggested I install 10.3, then upgrade to
the latest kernel. Which has very surprisingly had the effect that
I can now run one copy of mprime a lot longer than before. I had
one copy running almost eight hours last night.

Well, there you are. Didn't fall over for ~8 hours - a lot better
than ~20 minutes :-) .

May I suggest a nice Windows installation for you? They mostly work.


...


Ciao.


Randall Schulz
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >
Follow Ups