I have been having longstanding problems with kernel instability and
SuSE Pro 8.2 on an X31 IBM ThinkPad. I have been through a fair few of
the 2.4.21-pre/rc kernels, with or without -ac patches, and same for the
2.4.22 kernel. Same problem, spontaneous lockups, kernel oopsing here
there and everywhere. Most frustrating.
However, I can not remember the kernel ever having been this bad before,
so I have drawn the conclusion that perhaps the compiler used might be
the culprit. I found gcc_old (2.95.3) from SLPro 8.1 and installed that,
and I have recompiled kernel and all modules I use with that compiler.
I remember slating RedHat quite viciously about the stunt they pulled
with gcc 2.96 where they had to eventually release a kgcc just to
prevent the sig11 errors. I believe the time has come to give SuSE a rap
on the knuckles for not making a gcc_old available in SLPro 8.2
so that people can build solid kernels. Granted, most of the system
might have been compiled with gcc 3.3, and I believe it shows.
Gnome-terminal is a nightmare, if it doesn't spontaneously die, it takes
the box down, gkrellm will just haphazardly vanish into thin air, gdm2
would do a wobble now and then and take X down, usually when I was in
the middle of something important (box is a work machine). Other bits of
the desktop will also die now and then, usually not leaving any traces
in logs or excuses why they piss off into oblivion.
Please SuSE, by all means go cutting edge, but leave something around so
people can take one step back from the edge should it not fully work as
intended...
Right, that's my rant over. I do have asbestos pants on, so let the
flames begin... ;-)
--
Anders Karlsson
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alle 16:02, mercoledì 6 agosto 2003, Anders Karlsson ha scritto:
I have been having longstanding problems with kernel instability and SuSE Pro 8.2 on an X31 IBM ThinkPad. I have been through a fair few of the 2.4.21-pre/rc kernels, with or without -ac patches, and same for the 2.4.22 kernel. Same problem, spontaneous lockups, kernel oopsing here there and everywhere. Most frustrating.
However, I can not remember the kernel ever having been this bad before, so I have drawn the conclusion that perhaps the compiler used might be the culprit. I found gcc_old (2.95.3) from SLPro 8.1 and installed that, and I have recompiled kernel and all modules I use with that compiler.
That's very strange, are you sure it is not a hardware fault? I have installed Suse 8.2 on 4 machines and none has instability problems.
Right, that's my rant over. I do have asbestos pants on, so let the flames begin... ;-)
Praise -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux) iD8DBQE/MQ3q6v3ZTabyE8kRAvJlAKCm+d1/rMNzW4mRGjeCeogkyyZHfACfR1V7 POYOBB9Ofhbb7aGrAOTttC4= =Qhht -----END PGP SIGNATURE-----
On Wed, 2003-08-06 at 15:17, Praise wrote:
That's very strange, are you sure it is not a hardware fault? I have installed Suse 8.2 on 4 machines and none has instability problems.
I am quite confident this is not hardware fault, it might be related to
hardware, as the X31 is a Centrino 'thing' and as such is quite new. IDE
DMA did not start working until 2.4.21-pre5 iirc, and there has been
issues with radeonfb and DRI/DRM/AGP as well.
However, most of the time I have reported problems to the kernel mailing
list, they have pointed out, probably with justification, that kernels
compiled with GCC 3.3 is tantamount to suicide until the compiler have
settled down and the kernel people have worked out how to work around
compiler brain damage.
I have installed SuSE Pro 8.2 on a P-III 750MHz Dell laptop and not seen
any adverse effects, but everything worked out of the box there, no
hardware that required newer kernel to be supported.
All in all, I hope the 2.95.3 compiler sorts out the issues seen so far.
:-)
--
Anders Karlsson
On Wed, 2003-08-06 at 16:02, Anders Karlsson wrote:
I have been having longstanding problems with kernel instability and SuSE Pro 8.2 on an X31 IBM ThinkPad. I have been through a fair few of the 2.4.21-pre/rc kernels, with or without -ac patches, and same for the 2.4.22 kernel. Same problem, spontaneous lockups, kernel oopsing here there and everywhere. Most frustrating.
I think it's probably a combination of hardware being new (as you describe in a later post) and gcc 3.3. I've seen the same sort of thing on brand spanking new hardware that wasn't properly supported. The spontaneous lockups have been mentioned many times on this list where new Radeons that aren't fully supported are involved, or drivers mess up somehow.
the culprit. I found gcc_old (2.95.3) from SLPro 8.1 and installed that, and I have recompiled kernel and all modules I use with that compiler.
Let us know how this works out. Hans
On Wed, 2003-08-06 at 19:46, H du Plooy wrote:
I think it's probably a combination of hardware being new (as you describe in a later post) and gcc 3.3. I've seen the same sort of thing on brand spanking new hardware that wasn't properly supported. The spontaneous lockups have been mentioned many times on this list where new Radeons that aren't fully supported are involved, or drivers mess up somehow.
Some of the problems has definitely been linked to XFree86, radeonfb and AGP/DRM/DRI. I have tried to keep things documented on a webpage of mine, detailing what I have currently got running and what I have had to tweak to make things work.
the culprit. I found gcc_old (2.95.3) from SLPro 8.1 and installed that, and I have recompiled kernel and all modules I use with that compiler.
Let us know how this works out.
Well, the kernel seems to work quite well, I am running it now, and
ipsec 2.0.1 works very merrily with it. The only thing I have found that
is not working is synching with my PDA (visor.o), that will lock the
machine up but still leaves Alt-SysRq working.
AFS works, VMware 4.0.1 works, Alsa works. Can't complain too much. So
far, there has not been any application crashes, but I'll leave it a few
more days before drawing any conclusions. gkrellm hasn't disappeared yet
though, so looking good. :)
Regards,
--
Anders Karlsson
Anders Karlsson
I remember slating RedHat quite viciously about the stunt they pulled with gcc 2.96 where they had to eventually release a kgcc just to prevent the sig11 errors. I believe the time has come to give SuSE a rap on the knuckles for not making a gcc_old available in SLPro 8.2 so that people can build solid kernels.
There is a *big* difference! 2.96 was based on a snapshot of the gcc development tree, while our 3.3 is, as 'gcc --version' shows, a prerelease version which we hoped would have been released by the time the masters were ready to be shipped to the plant.
Granted, most of the system might have been compiled with gcc 3.3,
Neither might nor most. Our automatic build system makes sure *all* packages are compiled by the same compiler.
Please SuSE, by all means go cutting edge, but leave something around so people can take one step back from the edge should it not fully work as intended...
There was *no* way around using 3.3. The improvements where too big to be ignored and we definitely needed it for the AMD64 platform. Believe me, maintaining different versions of a compiler in parallel is a nightmare!
Right, that's my rant over. I do have asbestos pants on, so let the flames begin... ;-)
Grab 3.3 release from /pub/projects/gcc/8.2 on ftp.suse.com or its mirrors and try again :) Philipp -- Philipp Thomas work: pthomas@suse.de private: philipp.thomas@t-link.de
On Wed, 2003-08-06 at 23:40, Philipp Thomas wrote:
There is a *big* difference! 2.96 was based on a snapshot of the gcc development tree, while our 3.3 is, as 'gcc --version' shows, a prerelease version which we hoped would have been released by the time the masters were ready to be shipped to the plant.
Yes, RedHat was naughtier than SuSE was, but I still believe that using gcc 3.3 took SuSE closer to the cutting edge than what perhaps was healthy. I do understand the economics and the reasons SuSE had for doing what they did, that does not however mean I agree 100% with them. And if SuSE Pro 8.1 could have a gcc_old package, SuSE Pro 8.2 certainly could have had one as well.
Granted, most of the system might have been compiled with gcc 3.3,
Neither might nor most. Our automatic build system makes sure *all* packages are compiled by the same compiler.
At least there is consistency in the madness.. ;-)
Please SuSE, by all means go cutting edge, but leave something around so people can take one step back from the edge should it not fully work as intended...
There was *no* way around using 3.3. The improvements where too big to be ignored and we definitely needed it for the AMD64 platform. Believe me, maintaining different versions of a compiler in parallel is a nightmare!
That depends entirely on the goal of the exercise. SuSE placed the need for an AMD64 capable compiler higher than the need for a compiler that consistently would be capable of compiling kernels without choking on them, segfaulting here there and everywhere while compiling them. True, gcc 3.3 does seem to be capable of compiling things, all I am saying it that I do not feel entirely comfortable with the results. Gnome is particularly bad for crashing apps etc. Saying that, the KDE crowd should not jump around in glee at me saying this, as I have not used KDE on here, and there is a good chance KDE would have as bad a time as Gnome is having. And before people say if I am using ulb packages I can't point the finger at SuSE for that, the crashes happened before I even put the first ulb package on the system. I think I have headed most things off at the pass now.. ;-)
Right, that's my rant over. I do have asbestos pants on, so let the flames begin... ;-)
Grab 3.3 release from /pub/projects/gcc/8.2 on ftp.suse.com or its mirrors and try again :)
I have had that version of gcc 3.3 installed since about 24-48 hours
after it was put in ftp://ftp.suse.com/pub/people/pthomas/.../
As far as I can tell, the pre-release on the DVD and this update are
doing equally bad job of compiling for example 2.4.22-pre5 or
2.4.22-pre9 (my current kernel).
All this criticism aside, I am very happy with SuSE Pro 8.2. All in all
it is the best version of SuSE I have used so far, it is just that there
are niggles and it is not entirely nice on a Centrino laptop. Since
installing this IBM X31 with SuSE Pro 8.2, I have had a worse
track-record of hard hangs and oopses etc than I could possibly have
imagined Windows 95 being capable of.
I do realise that the hardware is very new, and support for it is
lacking, but I was not prepared for the hassle it has given me. All I
hope is that if I do point out the niggles, there is a chance it might
be a patch or a fix for some of them and it will steadily get better.
Regards,
--
Anders Karlsson
On 08/07/2003 07:37 AM, Anders Karlsson wrote:
Gnome is particularly bad for crashing apps etc. Saying that, the KDE crowd should not jump around in glee at me saying this, as I have not used KDE on here, and there is a good chance KDE would have as bad a time as Gnome is having.
Just a FYI, as I was trying to track down a bug (and offer a feature request) the other day at KDE.org regarding 3.1.3, I noticed it seems like all new kde stuff is being readied for gcc 3.4. I saw many references to this, so I doubt kde at least will suffer from gcc 3.3 problems, at least the newer stuff. -- Joe Morris New Tribes Mission Email Address: Joe_Morris@ntm.org Web Address: http://www.mydestiny.net/~joe_morris Registered Linux user 231871 God said, I AM that I AM. I say, by the grace of God, I am what I am.
* Joe Morris (NTM) (Joe_Morris@ntm.org) [030806 17:28]:
On 08/07/2003 07:37 AM, Anders Karlsson wrote:
Gnome is particularly bad for crashing apps etc. Saying that, the KDE crowd should not jump around in glee at me saying this, as I have not used KDE on here, and there is a good chance KDE would have as bad a time as Gnome is having.
Just a FYI, as I was trying to track down a bug (and offer a feature request) the other day at KDE.org regarding 3.1.3, I noticed it seems like all new kde stuff is being readied for gcc 3.4. I saw many references to this, so I doubt kde at least will suffer from gcc 3.3 problems, at least the newer stuff.
I'm just not sure what KDE issue have come about by the use of GCC 3.3. I compile tons of KDE programs, themes and other such things with 3.3 and I've never had a problem with any of it..well some of the 0.01 releases of themes from kde-look..but nothing else. I usually don't compile kernels but I do compile other things such as mutt and nmap..having no problems at all. I believe James compiles the ULB packages with 3.3 and I'm using a laptop with ULB on it..no issues. I believe it's the compilee ...not the compiler that would be the problem. -- Ben Rosenberg ---===---===---===--- mailto:ben@whack.org ----- If two men agree on everything, you can be sure that only one of them is doing the thinking.
On Thu, 2003-08-07 at 01:39, Ben Rosenberg wrote:
I'm just not sure what KDE issue have come about by the use of GCC 3.3. I compile tons of KDE programs, themes and other such things with 3.3 and I've never had a problem with any of it..well some of the 0.01 releases of themes from kde-look..but nothing else. I usually don't compile kernels but I do compile other things such as mutt and nmap..having no problems at all. I believe James compiles the ULB packages with 3.3 and I'm using a laptop with ULB on it..no issues. I believe it's the compilee ...not the compiler that would be the problem.
I have compiled some things with gcc 3.3 and they seem to work alright,
other things does not however. There does not seem to be a real pattern
to what decides to fail and what does not.
As for the compilee being at fault rather than the compiler, all I can
say is that I have altered nothing in procedure, but moved from SuSE Pro
8.1 to 8.2 and moved from a Dell Inspiron 5000e to a IBM X31. Draw your
own conclusion. I have been compiling my own kernels from kernel.org
sources since late 1997. I would like to think I roughly know what I am
doing in that respect by now...
And to settle claims about hardware faults, the X31 has run memtest for
18 hours plus not throwing up any errors. I think that should have
spotted problems if there were any.
On a side note, a 2.95.3 compiled 2.4.22-pre9 appears to reduce the
amount of applications spontaneously dying. I will give it two more days
before saying anything for definite though.
Regards,
--
Anders Karlsson
On Thu, 2003-08-07 at 10:06, Anders Karlsson wrote:
And to settle claims about hardware faults, the X31 has run memtest for 18 hours plus not throwing up any errors. I think that should have spotted problems if there were any.
All things considered, it seems fairly likely to me that your problems are caused by a bug in gcc3.3 affecting your specific hardware, as opposed to a hardware fault. Like Ben, I compile loads of stuff on a daily basis, and aside from 2.4.20 or 2.4.21 vanilla kernel (which I believe in general don't compile on GCC 3.3, discussed earlier on this list), I have not had a single problem. I have had no lockups, no software crashes, nothing, and I've had uptimes of more than a month at a time (until I had to reboot to win or the power went out). I'm extremely happy. Maybe something went wrong with your installation? I've found that a slight power surge or dip during installation is sometimes enough to damage one crucial package, which will come back and bite you again and again. Just a thought. Hans
On 08/07/2003 04:06 PM, Anders Karlsson wrote:
And to settle claims about hardware faults, the X31 has run memtest for 18 hours plus not throwing up any errors. I think that should have spotted problems if there were any.
Not to contradict, but I ran memtest 3.0 on my bosses IBM A20p for 17 hours with not one problem, but not too much later IBM said the memory was his problem. I believe in his case that his computer was getting too hot near the memory, and that was causing the failures. Memtest tested the memory, but the CPU was idling, so not too much heat. Just a couple weeks ago, after more 'memory problems', they replaced his main board and new 512M memory.
On a side note, a 2.95.3 compiled 2.4.22-pre9 appears to reduce the amount of applications spontaneously dying. I will give it two more days before saying anything for definite though.
Does your notebook get very hot in the area of the memory? Maybe your memory is going into thermal runaway and causing your intermittent problems. -- Joe Morris New Tribes Mission Email Address: Joe_Morris@ntm.org Web Address: http://www.mydestiny.net/~joe_morris Registered Linux user 231871 God said, I AM that I AM. I say, by the grace of God, I am what I am.
On Thu, 2003-08-07 at 11:15, Joe Morris (NTM) wrote:
On 08/07/2003 04:06 PM, Anders Karlsson wrote:
And to settle claims about hardware faults, the X31 has run memtest for 18 hours plus not throwing up any errors. I think that should have spotted problems if there were any.
Not to contradict, but I ran memtest 3.0 on my bosses IBM A20p for 17 hours with not one problem, but not too much later IBM said the memory was his problem. I believe in his case that his computer was getting too hot near the memory, and that was causing the failures. Memtest tested the memory, but the CPU was idling, so not too much heat. Just a couple weeks ago, after more 'memory problems', they replaced his main board and new 512M memory.
Hmm, the laptop does get very hot underneath. I suppose I can log a call against the laptop and see if they will replace the main board. The laptop is on for 12+ hours a day, so it does warm up quite a bit.
On a side note, a 2.95.3 compiled 2.4.22-pre9 appears to reduce the amount of applications spontaneously dying. I will give it two more days before saying anything for definite though.
Does your notebook get very hot in the area of the memory? Maybe your memory is going into thermal runaway and causing your intermittent problems.
This is a possibility that I will investigate. I will try and cool the
laptops underside (where it gets uncomfortably hot) and see if the
problems lessens.
Thank you for the suggestion, this is another thing that I will pursue.
:-)
Regards,
--
Anders Karlsson
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alle 10:06, giovedì 7 agosto 2003, Anders Karlsson ha scritto:
And to settle claims about hardware faults, the X31 has run memtest for 18 hours plus not throwing up any errors. I think that should have spotted problems if there were any.
Here it is a nice script which will test the stability of your machine, if your problem were not just ram but something else (like overheating). http://sdb.suse.de/en/sdb/html/hmeyer_memtest-sig11.html Praise -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux) iD8DBQE/MkNx6v3ZTabyE8kRAthZAKCPUaJoEhNVD0wlA8ZTjgWOB+Z1WACfSGtp VSQclf9qUMM5cEbpZzbpjLI= =1KCQ -----END PGP SIGNATURE-----
On Thu, 2003-08-07 at 13:17, Praise wrote:
Alle 10:06, giovedì 7 agosto 2003, Anders Karlsson ha scritto:
And to settle claims about hardware faults, the X31 has run memtest for 18 hours plus not throwing up any errors. I think that should have spotted problems if there were any.
Here it is a nice script which will test the stability of your machine, if your problem were not just ram but something else (like overheating).
A friend in the department used to work in PC Hardware Support in
Greenock (IBM) and he queried this with their thinkpad specialist. The
answer that came back was that it was either the fan or the planar that
was the problem.
To comment on the stability of the machine is that using a 2.95.3
compiled kernel does not give adverse effects, apart from the visor.o
module now hanging the machine. The other thing done was to put two pens
under the laptop giving a small space for heat to dissipate and one or
both of these things has really made a difference.
Taking into account what the thinkpad dude was saying, I would be
inclined to go with the assessment that there is a potential hardware
problem. I will try and get the fan replaced and see if that helps,
otherwise a planar replacement will be required. *urgh*
Many thanks for the suggestions people have come up with. All of them
will be put to use to test the stability of the system.
Regards,
--
Anders Karlsson
There is definitely something funny. I have an Athlon XP2200+/512Meg and everything builds and runs except stock kernels (2.4.2x)-pre/-rcx/-acx and 2.6.0-test2 and earlier, they build, but I get a constant problem booting them..... I can build and boot a SuSE kernel without problems. Strange if it's hardware as nothing else has problems. Typically this happens when /etc/init.d/hwscan is run. I can get the kernel to boot with occasional problems with USB if I move hwcan out of /etc/init.d. modprobe successfully works for all other modules. Regards Sid. Jul 19 18:42:47 barrabas kernel: usb.c: registered new driver hid Jul 19 18:42:47 barrabas kernel: hid-core.c: hid_init_reports usb_set_idle()=-911 Jul 19 18:42:47 barrabas kernel: Unable to handle kernel NULL pointer dereference at virtual address 000 00047 Jul 19 18:42:47 barrabas kernel: printing eip: Jul 19 18:42:47 barrabas kernel: e0b76d60 Jul 19 18:42:47 barrabas kernel: *pde = 00000000 Jul 19 18:42:47 barrabas kernel: Oops: 0000 Jul 19 18:42:47 barrabas kernel: CPU: 0 Jul 19 18:42:47 barrabas kernel: EIP: 0010:[<e0b76d60>] Not tainted Jul 19 18:42:47 barrabas kernel: EFLAGS: 00010286 Jul 19 18:42:47 barrabas kernel: eax: 00000034 ebx: 00000047 ecx: c03590d0 edx: ddae5f7c Jul 19 18:42:47 barrabas kernel: esi: de634000 edi: de634004 ebp: dd995e04 esp: dd995de0 Jul 19 18:42:47 barrabas kernel: ds: 0018 es: 0018 ss: 0018 Jul 19 18:42:47 barrabas kernel: Process modprobe.old (pid: 1123, stackpage=dd995000) <STUFF DELETED> Jul 19 18:42:47 barrabas kernel: Code: 8b 1b 39 fb 74 6a 8b 8e 44 0c 00 00 66 81 b9 d4 00 00 00 8e Jul 19 18:42:47 barrabas /etc/hotplug/usb.agent[1097]: /etc/hotplug/usb.agent: line 442: 1123 Segmentat ion fault $MODPROBE $MODULE >/dev/null 2>&1 Jul 19 18:42:47 barrabas /etc/hotplug/usb.agent[1097]: ... can't load module hid Anders Karlsson wrote:
On Thu, 2003-08-07 at 01:39, Ben Rosenberg wrote:
I'm just not sure what KDE issue have come about by the use of GCC 3.3. I compile tons of KDE programs, themes and other such things with 3.3 and I've never had a problem with any of it..well some of the 0.01 releases of themes from kde-look..but nothing else. I usually don't compile kernels but I do compile other things such as mutt and nmap..having no problems at all. I believe James compiles the ULB packages with 3.3 and I'm using a laptop with ULB on it..no issues. I believe it's the compilee ...not the compiler that would be the problem.
I have compiled some things with gcc 3.3 and they seem to work alright, other things does not however. There does not seem to be a real pattern to what decides to fail and what does not.
As for the compilee being at fault rather than the compiler, all I can say is that I have altered nothing in procedure, but moved from SuSE Pro 8.1 to 8.2 and moved from a Dell Inspiron 5000e to a IBM X31. Draw your own conclusion. I have been compiling my own kernels from kernel.org sources since late 1997. I would like to think I roughly know what I am doing in that respect by now...
And to settle claims about hardware faults, the X31 has run memtest for 18 hours plus not throwing up any errors. I think that should have spotted problems if there were any.
On a side note, a 2.95.3 compiled 2.4.22-pre9 appears to reduce the amount of applications spontaneously dying. I will give it two more days before saying anything for definite though.
Regards,
-- Sid Boyce ... hamradio G3VBV ... Cessna/Warrior Pilot Linux only shop
participants (7)
-
Anders Karlsson
-
Ben Rosenberg
-
H du Plooy
-
Joe Morris (NTM)
-
Philipp Thomas
-
Praise
-
Sid Boyce