lockups with gcc 4.1.0?
Hiya! A quick question to the software developers out there: Since i've been using Suse 10.1, i haven't been able to keep an uptime of more than 24 hours. i continually experience lockups while compiling my software (gcc 4.1.0, which comes with Suse). i just rebooted for the 3rd time in the past 4 hours and it's *getting on my nerves*. Have any other developers been experiencing lockups or mysterious compiler segfaults/internal compiler erros when using gcc under 10.1? *aaaarrrrgggghhhh* stephan@owl:~/cvs/s11n.net/SpiderApe/src/ape> uname -a Linux owl 2.6.16.13-4-default #1 Wed May 3 04:53:23 UTC 2006 i686 athlon i386 GNU/Linux The lockups have cause corruption on one of my drives: i've got a files which i can neither see nor delete, and i can't delete their containing directories: root@owl:/home/stephan/cvs/s11n.net/SpiderApe/src # rm -fr js rm: cannot lstat `js/jsxdrapi.c': Permission denied *AAAARRRRGGGGHHHH* And now i can't even 'cd' to some directories. Ah, Christ, i'm screwed... -- ----- stephan@s11n.net http://s11n.net "...pleasure is a grace and is not obedient to the commands of the will." -- Alan W. Watts
On Thursday 10 August 2006 22:01, stephan beal wrote:
And now i can't even 'cd' to some directories. Ah, Christ, i'm screwed...
Hi Stephan, You probably know how to tackle the consequences of the lockups using the rescue system but, if not, feel free to ask (please describe your hardware, too.) I'm curious to know if it's experienced any similar lockups while running other memory and/or disk IO intensive applications? Examples that come to mind might be AV recording/editing or multimedia production or CAD? Have you otherwise stress-tested and ruled out marginal hardware? When you installed 10.1, did you do so on one or more *clean* partitions (i.e. all contents pre-erased with the installer allowed to format it/them before installing?) Was this a 'fresh' installation or did you upgrade a previous version? regards, Carl
On Friday 11 August 2006 04:26, Carl Hartung wrote:
On Thursday 10 August 2006 22:01, stephan beal wrote:
And now i can't even 'cd' to some directories. Ah, Christ, i'm screwed...
Hi Stephan,
You probably know how to tackle the consequences of the lockups using the rescue system but, if not, feel free to ask (please describe your hardware, too.)
i just got back from doing the dreaded: fsck.reiserfs --rebuild-tree /dev/hda3 It mostly worked. i had to run with --fix-fixable a couple more times to correct my problems. It would *appear* that my lockups were related to corruption on my /home partition (reiserfs). i'm considering switching it to XFS because reiser has done this to me once before), but in retrospect, reiserfs's utilities have always recovered fairly gracefully from filesystem corruption, and i don't yet have enough experience with XFS to know if i would be so lucky with it.
I'm curious to know if it's experienced any similar lockups while running other memory and/or disk IO intensive applications?
So far, no. Last week it was crashing when i was running gcc (about 5-10% of the time it would crash), but always immediately before it crashed i got a cryptic NVIDIA error message in the syslog with the pointer addresses. i replaced the NVidia X driver with the standard (non-accelerated) driver and assumed the problem was gone. (This was actually better for me, anyway, as it keeps me from wasting all my time playing games like Tremulous.) Tonight, though, it was crash after crash.
production or CAD? Have you otherwise stress-tested and ruled out marginal hardware?
i have considered it, but haven't done it. In my experience, gcc is about as good a stress test as any. In one case i discovered i had a bad RAM chip because gcc kept failing with odd assembly-level errors, even when memcheck didn't pick up the problem.
When you installed 10.1, did you do so on one or more *clean* partitions (i.e. all contents pre-erased with the installer allowed to format it/them before installing?) Was this a 'fresh' installation or did you upgrade a previous version?
i *always* do a fresh install, because i don't trust any OS upgrade process (not biased against Suse, just against upgrades in general). However, my /home partition was of course *not* reformatted, and if it was corrupted before, then of course the 10.1 install would have inherited that corruption. For now i'm going to assume that the reiserfs corruption was the problem and hope/pray that it doesn't happen again. i was, luckily enough, able to tar up my /home directories, rescuing my evening's worth of C++ code. The --rebuild-tree process only deleted 8 files, none of which i use, and most of which were old HTML and PDF files generated by Lyx more than 2 years ago. Thanks a lot for your feedback! -- ----- stephan@s11n.net http://s11n.net "...pleasure is a grace and is not obedient to the commands of the will." -- Alan W. Watts
On Thursday 10 August 2006 23:45, stephan beal wrote:
i *always* do a fresh install, because i don't trust any OS upgrade process (not biased against Suse, just against upgrades in general).
That's called 'experienced user' syndrome ;-)
However, my /home partition was of course *not* reformatted, and if it was corrupted before, then of course the 10.1 install would have inherited that corruption.
This would be my guess, too.
For now i'm going to assume that the reiserfs corruption was the problem and hope/pray that it doesn't happen again. i was, luckily enough, able to tar up my /home directories, rescuing my evening's worth of C++ code. The --rebuild-tree process only deleted 8 files, none of which i use, and most of which were old HTML and PDF files generated by Lyx more than 2 years ago.
Thanks a lot for your feedback!
Glad you were able to 'land' this one gracefully. Carl
On Thursday 10 August 2006 22:45, stephan beal wrote:
It would *appear* that my lockups were related to corruption on my /home partition (reiserfs). i'm considering switching it to XFS because reiser has done this to me once before), but in retrospect, reiserfs's utilities have always recovered fairly gracefully from filesystem corruption, and i don't yet have enough experience with XFS to know if i would be so lucky with it. Stick with ReiserFS. Your own recent recovery is the reason why.
I do think you have the diagnosis backwards, however. I believe your lockups are causing the drive corruption and the lockups are hardware related--- very slim chance this would be a gcc problem... in fact, I would say no way. Set your machine up for a memory check... let it cycle several times (couple of hours) and see what turns up. -- Kind regards, M Harris <><
On Friday 11 August 2006 06:26, M Harris wrote:
I do think you have the diagnosis backwards, however. I believe your lockups are causing the drive corruption and the lockups are hardware related--- very slim chance this would be a gcc problem... in fact, I would say no way.
i tend to agree - i think gcc was somehow triggering/demonstrating the problem. The filesystem on that partition is well over 2 years old, without a reformat in that time, so maybe it's just got a lot of cruft in the filesystem internals (i spend most of my time programming/compiling, which generates tons of files). The rest of the partitions on that drive aren't demonstrating any problems (so far).
Set your machine up for a memory check... let it cycle several times (couple of hours) and see what turns up.
That's the next course of action. i've had this box 2.5 years, so i don't have any reason to believe my RAM is bad. In my experience, if HW is going to die, it does so when it's very young or very old. -- ----- stephan@s11n.net http://s11n.net "...pleasure is a grace and is not obedient to the commands of the will." -- Alan W. Watts
On Thursday 10 August 2006 23:53, stephan beal wrote:
In my experience, if HW is going to die, it does so when it's very young or very old. Yup. ... if it is going to die it will usually die in the first 30 hours of use... if it lasts 30 hours it will last several years...
You might want to simply reseat your connectors... use an esd strap... and reseat the memory dimms, hd cable plugs, and power supply plugs... sometimes flaky hardware issues crop up in connectors, esp if the environment is smoky or humid. -- Kind regards, M Harris <><
On Friday 11 August 2006 07:21, M Harris wrote:
You might want to simply reseat your connectors... use an esd strap... and reseat the memory dimms, hd cable plugs, and power supply plugs... sometimes flaky hardware issues crop up in connectors, esp if the environment is smoky or humid.
That's a good idea. The suspect RAM chip was indeed very dusty and had some "gook" on one of the connectors, so i'm going to give it another try and memcheck it run over the weekend (while i'm out of town). Thank goodness i had an old Suse 8 DVD with a memcheck boot option... -- ----- stephan@s11n.net http://s11n.net "...pleasure is a grace and is not obedient to the commands of the will." -- Alan W. Watts
You might want to simply reseat your connectors... use an esd strap... and reseat the memory dimms, hd cable plugs, and power supply plugs... sometimes flaky hardware issues crop up in connectors, esp if the environment is smoky or humid.
That's a good idea. The suspect RAM chip was indeed very dusty and had some "gook" on one of the connectors, so i'm going to give it another try and memcheck it run over the weekend (while i'm out of town).
Hm I think memtest86 should again be installed as an additional entry in the bootloader, like it has been before IIRC.
Thank goodness i had an old Suse 8 DVD with a memcheck boot option...
Jan Engelhardt --
On Friday 11 August 2006 06:53, stephan beal wrote:
On Friday 11 August 2006 06:26, M Harris wrote:
Set your machine up for a memory check... let it cycle several times (couple of hours) and see what turns up.
That's the next course of action. i've had this box 2.5 years, so i don't have any reason to believe my RAM is bad. In my experience, if HW is going to die, it does so when it's very young or very old.
It was indeed a bad RAM chip. i believe i've got the bad guy singled out, and am now running gcc to ma... <dropped carrier> just kidding... as soon as i safely close kmail i'll be running gcc to make sure i got the right chip (i don't want to wait another 7 hours on the test program when gcc can find it in 4 seconds). It's a shame to lose 512MB, but the machine is still usable. Once again, gcc turns out to be the best app for finding bad memory. Thanks again for your feedback! -- ----- stephan@s11n.net http://s11n.net "...pleasure is a grace and is not obedient to the commands of the will." -- Alan W. Watts
On Friday 11 August 2006 05:45, stephan beal wrote:
For now i'm going to assume that the reiserfs corruption was the problem and hope/pray that it doesn't happen again.
And not 2 minutes after i sent that, the filesystem went haywire again. i've moved to a new partition (XFS), restored from a backup, and am praying yet again... -- ----- stephan@s11n.net http://s11n.net "...pleasure is a grace and is not obedient to the commands of the will." -- Alan W. Watts
participants (4)
-
Carl Hartung
-
Jan Engelhardt
-
M Harris
-
stephan beal