Mailinglist Archive: opensuse-programming (98 mails)

< Previous Next >
Re: [suse-programming-e] Code produces different results
  • From: Jerry Feldman <gaf@xxxxxxx>
  • Date: Mon, 2 May 2005 16:37:59 -0400
  • Message-id: <200505021637.59518.gaf@xxxxxxx>
On Monday 02 May 2005 4:10 pm, duneldaion@xxxxxx wrote:
> Dear all,
>
> I experience a strange problem with a C-Program. When I run it on
> different machines, sometimes it produces different results, and
> sometimes it doesn't.
>
> Basically, there seem to exist two different cases:
> a) I compiled the program at the end of January. The compiled binary
> was copied to different computers, and produces different results on
> different machines.
> b) I recompiled the program recently. With the _new_ binary of April,
> all machines calculate the same result!
>
> First ideas:
> 1) Yes, the code was identical for case a) and b).
> 2) Could it be that the hardware of my computers differs too much?
> This does not seem to be the case. Also, for the case b) it does not
> matter on which of the machines I compile before running it on all of
> them.
> 3) Could it be that some kernel update (or gcc update?) partially
> "broke" the compiler, and led to the results of case a) (which was
> compiled on a SuSE Linux 9.0 with kernel 2.4.21-99).
> A later kernel update possibly "repaired" this problem (case b) was
> compiled with SuSE Linux 9.0 with 2.4.21-xxx kernel, with xxx > 166).
> Also, a check with a SuSE 9.1, kernelversion 2.6.5-7.111 reproduces
> case b).
>
There are several variables, but, a properly tested piece of code should be
able to provide reproducible results. There are some variables that could
be expected:
First, if the code is using random numbers.
Secondly, the code has a latent bug, such as an uninitialized variable
somewhere.
Thirdly, if the code is written to depend on behavior that may have changed
in 2.6, then you could have trouble.

Also, you don't mention if you are doing integer or floating point math, and
what language the code war written in. But, I still would suspect that the
code has a latent bug.

As an example, I once received a bug report written by the compiler people
on an application that had been running for over 5 years with no trouble.
On close inspection, I found that this piece of code had a latent bug in it
that had existed even in then old AT&T Unix systems, but never caused a
problem before.

--
Jerry Feldman <gaf@xxxxxxx>
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9

< Previous Next >
Follow Ups
References