On Monday 02 May 2005 4:10 pm, duneldaion@web.de wrote:
Dear all,
I experience a strange problem with a C-Program. When I run it on different machines, sometimes it produces different results, and sometimes it doesn't.
Basically, there seem to exist two different cases: a) I compiled the program at the end of January. The compiled binary was copied to different computers, and produces different results on different machines. b) I recompiled the program recently. With the _new_ binary of April, all machines calculate the same result!
First ideas: 1) Yes, the code was identical for case a) and b). 2) Could it be that the hardware of my computers differs too much? This does not seem to be the case. Also, for the case b) it does not matter on which of the machines I compile before running it on all of them. 3) Could it be that some kernel update (or gcc update?) partially "broke" the compiler, and led to the results of case a) (which was compiled on a SuSE Linux 9.0 with kernel 2.4.21-99). A later kernel update possibly "repaired" this problem (case b) was compiled with SuSE Linux 9.0 with 2.4.21-xxx kernel, with xxx > 166). Also, a check with a SuSE 9.1, kernelversion 2.6.5-7.111 reproduces case b).
There are several variables, but, a properly tested piece of code should be
able to provide reproducible results. There are some variables that could
be expected:
First, if the code is using random numbers.
Secondly, the code has a latent bug, such as an uninitialized variable
somewhere.
Thirdly, if the code is written to depend on behavior that may have changed
in 2.6, then you could have trouble.
Also, you don't mention if you are doing integer or floating point math, and
what language the code war written in. But, I still would suspect that the
code has a latent bug.
As an example, I once received a bug report written by the compiler people
on an application that had been running for over 5 years with no trouble.
On close inspection, I found that this piece of code had a latent bug in it
that had existed even in then old AT&T Unix systems, but never caused a
problem before.
--
Jerry Feldman