Mailinglist Archive: opensuse (2459 mails)

< Previous Next >
Re: [opensuse] [OT] unstable system - still trying to identify the culprit.
  • From: Linda Walsh <suse@xxxxxxxxx>
  • Date: Sun, 02 Mar 2008 13:28:27 -0800
  • Message-id: <47CB1BFB.6080203@xxxxxxxxx>
Per Jessen wrote:

Have you monitored memory, swap, cpu and disk load while the system is
stressed? Does the system start "thrashing"?

Are you using 4K or 8K stacks (or is this a 32-bit system?). What file
system are you using?

I had a problem on a 32-bit system where I was using 4K kernel stacks. Something had changed in some driver somewhere, over time, and apparently
was causing occasional corruption when I was doing heavy file I/O --
backups to a hard disk from other computers running at same time
local maintenance tasks were running... symptom was the kernel would
just "hang" (no messages, no hints). Switching to an 8K kernel resolved
the problem...

How about trying a non-SMP kernel? If you load down "1-core", can it
still crash? may be limit maxcpus to 1 and try testing (same kernel, only
1 core), but also try a UP compiled kernel if the same kernel + 1 core
fails.

You could also try limiting the max memory to the 1st 1GB and see if
that changes how "fast" or how "often" it crashes. Have you tried
it with half the memory (or can you?) -- not that I'm suspecting the
memory, but sometimes probs happen in 4GB that won't happen in 2GB (MS
disabled top 1GB of address space in XP to prevent faulty driver problems;
I know you aren't running WinXP, but...same idea "could" hold...).

At this point, you are starting with a fresh system that hasn't been
"proven", so it _could_ be virtually anything. How about losing the
"RAID"...can you try a PATA disk? Like do you have a spare you could
do a test install and boot off of?

On the above mentioned 32-bit system, another confounding factor was
the addition of a SATA controller & drive. Going back to a pure PATA
system changed the "frequency" of my crashes to be limited to the
early AM when all the backup jobs and maintenance jobs ran. Still haven't
added back the SATA (am a bit "afraid"...it's working, and don't want to
break it again...but I know that's a partly "lame" excuse...:-))...

Have you disabled all "extra" hardware possible? I don't know AMD's too
well -- do they have multi-threading? In debugging my crash, I also
removed an add-in USB controller & a separate firewire disk.

Does your CPU use any frequency scaling? Can that be disabled? What
kernels have you tried?

-linda...

--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >