Mailinglist Archive: opensuse (3232 mails)

< Previous Next >
Re: [SLE] lockups with gcc 4.1.0?
  • From: stephan beal <stephan@xxxxxxxx>
  • Date: Fri, 11 Aug 2006 03:45:18 +0000 (UTC)
  • Message-id: <200608110545.12523.stephan@xxxxxxxx>
On Friday 11 August 2006 04:26, Carl Hartung wrote:
> On Thursday 10 August 2006 22:01, stephan beal wrote:
> > And now i can't even 'cd' to some directories. Ah, Christ, i'm
> > screwed...
>
> Hi Stephan,
>
> You probably know how to tackle the consequences of the lockups using
> the rescue system but, if not, feel free to ask (please describe your
> hardware, too.)

i just got back from doing the dreaded:

fsck.reiserfs --rebuild-tree /dev/hda3

It mostly worked. i had to run with --fix-fixable a couple more times to
correct my problems.

It would *appear* that my lockups were related to corruption on my /home
partition (reiserfs). i'm considering switching it to XFS because
reiser has done this to me once before), but in retrospect, reiserfs's
utilities have always recovered fairly gracefully from filesystem
corruption, and i don't yet have enough experience with XFS to know if
i would be so lucky with it.

> I'm curious to know if it's experienced any similar lockups while
> running other memory and/or disk IO intensive applications?

So far, no. Last week it was crashing when i was running gcc (about
5-10% of the time it would crash), but always immediately before it
crashed i got a cryptic NVIDIA error message in the syslog with the
pointer addresses. i replaced the NVidia X driver with the standard
(non-accelerated) driver and assumed the problem was gone. (This was
actually better for me, anyway, as it keeps me from wasting all my time
playing games like Tremulous.) Tonight, though, it was crash after
crash.

> production or CAD? Have you otherwise stress-tested and ruled out
> marginal hardware?

i have considered it, but haven't done it. In my experience, gcc is
about as good a stress test as any. In one case i discovered i had a
bad RAM chip because gcc kept failing with odd assembly-level errors,
even when memcheck didn't pick up the problem.

> When you installed 10.1, did you do so on one or more *clean*
> partitions (i.e. all contents pre-erased with the installer allowed
> to format it/them before installing?) Was this a 'fresh' installation
> or did you upgrade a previous version?

i *always* do a fresh install, because i don't trust any OS upgrade
process (not biased against Suse, just against upgrades in general).
However, my /home partition was of course *not* reformatted, and if it
was corrupted before, then of course the 10.1 install would have
inherited that corruption.

For now i'm going to assume that the reiserfs corruption was the problem
and hope/pray that it doesn't happen again. i was, luckily enough, able
to tar up my /home directories, rescuing my evening's worth of C++
code. The --rebuild-tree process only deleted 8 files, none of which i
use, and most of which were old HTML and PDF files generated by Lyx
more than 2 years ago.

Thanks a lot for your feedback!

--
----- stephan@xxxxxxxx http://s11n.net
"...pleasure is a grace and is not obedient to the commands
of the will." -- Alan W. Watts
< Previous Next >