computational glitch
Running SuSE 10.1 64-bit on an opteron system which is mostly unattended. Happened to notice that 'zmd' was running, and taking up a significant CPU percentage. I had done *nothing* to explicitly start zmd. I might have been using YaST2 at the time in question (don't remember clearly) - but I think not. My reason for posting is that an idle priority program, heavily floating-point intensive, reported a computational glitch (from which it was able to recover) just about the time 'zmd' started. I'm posting this to alert the SuSE developers to check that 'zmd' is not happening to affect the hardware involved with floating point. I think it suspicious that as 'zmd' started using up CPU cycles, that seemed to coincide with a computational glitch in another program on my system. [I have seen no other glitches with that program, before or since.] mikus
On Sunday 10 September 2006 06:26, Mikus Grinbergs wrote:
My reason for posting is that an idle priority program, heavily floating-point intensive, reported a computational glitch (from which it was able to recover)
A glitch? WTF is that? A miscalculation? (how would one know?) Unavailability of the FP processor? -- _____________________________________ John Andersen
John, On Sunday 10 September 2006 12:10, John Andersen wrote:
On Sunday 10 September 2006 06:26, Mikus Grinbergs wrote:
My reason for posting is that an idle priority program, heavily floating-point intensive, reported a computational glitch (from which it was able to recover)
A glitch? WTF is that? A miscalculation? (how would one know?) Unavailability of the FP processor?
Don't be so pedantic. CPUs do malfunction, especially when overclocked or overheated. It's entirely feasible to write software meant to detect such errors by performing a computation redundantly and then comparing the results or by computing well-known values (pi or prime numbers) and verifying the accuracy of the locally computed results. Look into CPUBurn and Prime95. They are both available for Linux (most CPU stress-testing / burn-in software seems to be written for Windows). Randall Schulz
On Sunday 10 September 2006 14:50, Randall R Schulz wrote:
John,
On Sunday 10 September 2006 12:10, John Andersen wrote:
On Sunday 10 September 2006 06:26, Mikus Grinbergs wrote:
My reason for posting is that an idle priority program, heavily floating-point intensive, reported a computational glitch (from which it was able to recover)
A glitch? WTF is that? A miscalculation? (how would one know?) Unavailability of the FP processor?
Don't be so pedantic. CPUs do malfunction
Look who's being pedantic...! How can anyone be expected to diagnose a "Glitch"? Do any of us here look like Veterinarians? -- _____________________________________ John Andersen
On Sunday 10 September 2006 16:56, John Andersen wrote:
...
How can anyone be expected to diagnose a "Glitch"?
Just as I explained.
Do any of us here look like Veterinarians?
I don't know. I can't see anyone here. What does animal medicine have to do with any of this? RRS
On Sunday 10 September 2006 20:48, Randall R Schulz wrote:
Do any of us here look like Veterinarians?
I don't know. I can't see anyone here. What does animal medicine have to do with any of this? from wiki: "Sometimes [a] game's code may be modified to create interesting glitches. For example, in the game Impossible Creatures, which focuses on combining 2 animals, making a combined animal "combinable" can result in 3 or 4-animal combinations."
Gamers use the term frequently. (as above) Sounds to me that the cpu (fpu) skipped a beat (glitch) somewhere due to heating (overheating). -- Kind regards, M Harris <>< harrismh777@earthlink.net
On Monday 11 September 2006 03:48, Randall R Schulz wrote:
On Sunday 10 September 2006 16:56, John Andersen wrote:
...
How can anyone be expected to diagnose a "Glitch"?
Just as I explained.
Do any of us here look like Veterinarians?
I don't know. I can't see anyone here. What does animal medicine have to do with any of this?
Diagnosing bugs?
On Sunday 10 September 2006 17:48, Randall R Schulz wrote:
On Sunday 10 September 2006 16:56, John Andersen wrote:
...
How can anyone be expected to diagnose a "Glitch"?
Just as I explained.
Do any of us here look like Veterinarians?
What does animal medicine have to do with any of this?
Vets are the only docs that can look a jackass in the mouth and tell you what is wrong with it. All the other docs need to ask questions. Nothing personal intended. -- _____________________________________ John Andersen
On Sun, 10 Sep 2006 15:56:56 -0800 John Andersen <jsa@pen.homeip.net> wrote:
How can anyone be expected to diagnose a "Glitch"?
I just wanted to post an unusual coincidence that I saw. I deliberately used the word "glitch", because it was a transient event for which I did __not__ ask for any 'diagnosis'. It's over with; it has not occurred again; I believe I could not re- create it again. Why continue flogging? mikus p.s. Lately my experience has been that when I do post a message asking for assistance (e.g., for how to prevent 'smart' from timing out so quickly if data transfer gets errors), I don't receive back any ideas. But here, when I did not ask, people want to help -- thanks, but ...
On Sunday 10 September 2006 16:26, Mikus Grinbergs wrote:
Running SuSE 10.1 64-bit on an opteron system which is mostly unattended. Happened to notice that 'zmd' was running, and taking up a significant CPU percentage. I had done *nothing* to explicitly start zmd. I might have been using YaST2 at the time in question (don't remember clearly) - but I think not.
My reason for posting is that an idle priority program, heavily floating-point intensive, reported a computational glitch (from which it was able to recover) just about the time 'zmd' started. I'm posting this to alert the SuSE developers to check that 'zmd' is not happening to affect the hardware involved with floating point.
I find this extremely hard to believe. zmd doesn't use any particular kernel drivers, so it shouldn't have any influence on any hardware at all
I think it suspicious that as 'zmd' started using up CPU cycles, that seemed to coincide with a computational glitch in another program on my system. [I have seen no other glitches with that program, before or since.]
As John said, what is a 'glitch'? I guess if your system was running at 110% with both your program and zmd going at it, it could be exposing some problem with the hardware. Perhaps it was on the virge of overheating? Perhaps there is a problem with a memory cell that you happened to expose?
Anders Johansson wrote:
I think it suspicious that as 'zmd' started using up CPU cycles, that seemed to coincide with a computational glitch in another program on my system. [I have seen no other glitches with that program, before or since.]
As John said, what is a 'glitch'?
Guys, it's a pretty common expression: http://de.wikipedia.org/wiki/Glitch_%28Elektronik%29 "In der Elektronik bezeichnet man mit Glitch eine kurzzeitige Falschaussage in logischen Schaltungen." You both need to get out more :-) /Per Jessen, Zürich
On Sunday 10 September 2006 22:26, Per Jessen wrote:
Anders Johansson wrote:
I think it suspicious that as 'zmd' started using up CPU cycles, that seemed to coincide with a computational glitch in another program on my system. [I have seen no other glitches with that program, before or since.]
As John said, what is a 'glitch'?
Guys, it's a pretty common expression:
The word itself is fairly common, yes. So is "bug", but would you say "there is a bug in this program" is a good and proper bug report? If I, in response to that, were to come back with "what do you mean by a bug?", would you then direct me to a dictionary definition of the word "bug"? "glitch" simply means "something that malfunctions" and is more or less synonymous with "bug". It's simply not specific enough for anyone to give a reasonable diagnosis
"In der Elektronik bezeichnet man mit Glitch eine kurzzeitige Falschaussage in logischen Schaltungen."
Daß ist wahr, und es ist mehr oder weniger was ich sage oben. Es bedeutet nur "etwas was falsch ist"
You both need to get out more :-)
are there more glitches outside?
On 10/09/06 14:32, Anders Johansson wrote:
<snip> So is "bug", but would you say "there is a bug in this program" is a good and proper bug report? If I, in response to that, were to come back with "what do you mean by a bug?", would you then direct me to a dictionary definition of the word "bug"?
"glitch" simply means "something that malfunctions" and is more or less synonymous with "bug". It's simply not specific enough for anyone to give a reasonable diagnosis
In computerland, "bug" has a rather specific and narrow definition in terms of coding errors in software. A "glitch" is far less precise, and could refer to anything, not just software errors -- ie., it is an appropriate word to use when describing a problem of unknown origin. To me, it is clear that this is what the OP intended -- to describe a problem whose origin is, to him, unknown, and to request general suggestions about where to look (possibly both hardware and software) to find a resolution.
Anders Johansson wrote:
On Sunday 10 September 2006 22:26, Per Jessen wrote:
As John said, what is a 'glitch'?
Guys, it's a pretty common expression:
The word itself is fairly common, yes.
So is "bug", but would you say "there is a bug in this program" is a good and proper bug report? If I, in response to that, were to come back with "what do you mean by a bug?", would you then direct me to a dictionary definition of the word "bug"?
Anders, the way I read your question you were asking what "glitch" meant because you didn't know the word - I misunderstood, I simply didn't read between the lines that you were asking for a proper explanation of the bug.
You both need to get out more :-)
are there more glitches outside?
Most definitely - natures's full of them. /Per Jessen, Zürich
On Sun, 10 Sep 2006 21:22:06 +0200 Anders Johansson <andjoh@rydsbo.net> wrote:
On Sunday 10 September 2006 16:26, Mikus Grinbergs wrote:
My reason for posting is that an idle priority program, heavily floating-point intensive, reported a computational glitch (from which it was able to recover) just about the time 'zmd' started. I'm posting this to alert the SuSE developers to check that 'zmd' is not happening to affect the hardware involved with floating point.
I find this extremely hard to believe. zmd doesn't use any particular kernel drivers, so it shouldn't have any influence on any hardware at all
Does it hurt to look extra hard for anything unusual? It may well be that something external influenced both programs. I'm still mystified as to why zmd started up in the first place.
I guess if your system was running at 110% with both your program and zmd going at it, it could be exposing some problem with the hardware. Perhaps it was on the verge of overheating? Perhaps there is a problem with a memory cell that you happened to expose?
My system runs at 110% (sic) 24/7, and has for months in its current configuration. I did not notice any previous unexplained occurrences. And if it's heat or bad memory, why have no problems shown up in the two days (of 110%!) since what I described. [Though I did halt 'zmd' after it had been wasting CPU cycles for 24+ hours, before I noticed that it was running. (I could tell from the drop in logged output by the idle priority program, which went back to full output once I halted zmd.) ] mikus
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Sunday 2006-09-10 at 16:59 -0500, Mikus Grinbergs wrote:
I'm still mystified as to why zmd started up in the first place.
It is a system service daemon, running all the time. The increase in activity could be triggered by a yast request, I think. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFFBJzEtTMYHG2NR9URAuegAJ9JLgX+0IjVUfUouD02kDWcquOHeACeJNNv bsGYYRMD4XfAMU1Zw8GHePM= =lTAV -----END PGP SIGNATURE-----
On Sunday 10 September 2006 23:59, Mikus Grinbergs wrote:
On Sun, 10 Sep 2006 21:22:06 +0200 Anders Johansson <andjoh@rydsbo.net> wrote:
On Sunday 10 September 2006 16:26, Mikus Grinbergs wrote:
My reason for posting is that an idle priority program, heavily floating-point intensive, reported a computational glitch (from which it was able to recover) just about the time 'zmd' started. I'm posting this to alert the SuSE developers to check that 'zmd' is not happening to affect the hardware involved with floating point.
I find this extremely hard to believe. zmd doesn't use any particular kernel drivers, so it shouldn't have any influence on any hardware at all
Does it hurt to look extra hard for anything unusual?
If you're looking in the wrong places, yes it can, because it involves wasting time. Other user space programs simply can not cause this type of error. It is either a hardware problem, a bug in the kernel, or in the program reporting the error. Your reasoning is known as "cum hoc ergo propter hoc". This was known as a logical fallacy already by Aristotle. There has to be some causality involved.
My system runs at 110% (sic) 24/7, and has for months in its current configuration. I did not notice any previous unexplained occurrences. And if it's heat or bad memory, why have no problems shown up in the two days (of 110%!) since what I described.
If it's something that's only just starting to go bad, you might not see another problem for 6 months. Or it may be some really obscure race condition in the kernel that only gets triggered once in a blue moon.
And if it's heat or bad memory, why have no problems shown up in the two days (of 110%!) since what I described. [Though I did halt 'zmd' after it had been wasting CPU cycles for 24+ hours, before I noticed that it was running. (I could tell from the drop in logged output by the idle priority program, which went back to full output once I halted zmd.) ] I can sure believe the correlation. Since I am running a desktop at home, I power off every night. I have uninstalled zmd, rug, and zen-updater (just now again) probably 10 times since I installed 10.1. It by itself will almost double the temp of my cpu (AMD64-3200) when it runs. I keep trying it every few weeks to see if it has improved, but do not like watching my cpu stressed at every boot for so long. I have reduced my installation sources to the dvd, inst-source and non-oss-inst-source, and update, but still it hammers the cpu for a significant amount of time every boot. For my needs, I can just run smart every few days to get the updates and save the wear and tear on
Mikus Grinbergs wrote: the cpu. Anyway, I have ksensors running in the tray with the cpu temp shown, and have seen the effects of zmd, and would guess it indeed may have overheated your cpu or at least pushed it to its limits. Interesting to me, the latest versions seemed quite a bit better as measured by top, but with the various processes (i.e. gpg, update-status, etc.) switching and not maxing out the cpu at all, the cpu temp still climbs to "reportedly" 66 C. HTH. -- Joe Morris Registered Linux user 231871
participants (9)
-
Anders Johansson
-
Carlos E. R.
-
Darryl Gregorash
-
Joe Morris (NTM)
-
John Andersen
-
M Harris
-
mikus@bga.com
-
Per Jessen
-
Randall R Schulz