[opensuse-programming] gfortran (?) problem
I have spent a lot of time trying to identify the source of a failure, and I need help. I have been using a fairly large and complex program for many years. In short, it is based on the Cernlib files (for those who know), with the code and scripts adjusted to handle both 32-bit and 64-bit processors and both g77 or gfortran. I have been able to get my applications to work fine up through SuSE 10.2. In SuSE 10.3, my major application segfaults at the beginning, when only a limited amount of initialization has been done. I still have some 10.2 machines available. Everything works with both 32-bit and 64-bit versions. At the moment, I only have 64-bit machines with 10.3. I have run strace and valgrind. I see things that differ between codes that work and codes that don't, but I am not able to discern what might be the problem -- it's beyond me. It is not easy or possible to produce a test file that would isolate the problem. I am using a very small user main program, but it links to libraries that have very complex generation scripts. There is all kinds of memory management going on in the stuff that is failing (through Fortran and C) but, as noted above, it seems to be fine under SuSE 10.2 and earler. The gfortran for SuSE 10.2 is 4.1.2. I have tried both 4.1.3 and 4.2.1 in SuSE 10.3; both lead to failures. My basic question is: what got changed in going from 10.2 to 10.3, either in gfortran/gcc or in various libraries? (I had to change my link list of system libraries to get an executable,) Any hints as to how I might track down the problems (valgrind or whatever) would be appreciated. Thank you, Joe Comfort Joseph.Comfort@asu.edu --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-programming+help@opensuse.org
Joseph Comfort wrote:
I have spent a lot of time trying to identify the source of a failure, and I need help.
I have been using a fairly large and complex program for many years. In short, it is based on the Cernlib files (for those who know), with the code and scripts adjusted to handle both 32-bit and 64-bit processors and both g77 or gfortran. I have been able to get my applications to work fine up through SuSE 10.2.
In SuSE 10.3, my major application segfaults at the beginning, when only a limited amount of initialization has been done. I still have some 10.2 machines available. Everything works with both 32-bit and 64-bit versions. At the moment, I only have 64-bit machines with 10.3. I have run strace and valgrind. I see things that differ between codes that work and codes that don't, but I am not able to discern what might be the problem -- it's beyond me.
It is not easy or possible to produce a test file that would isolate the problem. I am using a very small user main program, but it links to libraries that have very complex generation scripts. There is all kinds of memory management going on in the stuff that is failing (through Fortran and C) but, as noted above, it seems to be fine under SuSE 10.2 and earler. The gfortran for SuSE 10.2 is 4.1.2. I have tried both 4.1.3 and 4.2.1 in SuSE 10.3; both lead to failures.
My basic question is: what got changed in going from 10.2 to 10.3, either in gfortran/gcc or in various libraries? (I had to change my link list of system libraries to get an executable,) Any hints as to how I might track down the problems (valgrind or whatever) would be appreciated.
Thank you, Joe Comfort Joseph.Comfort@asu.edu --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-programming+help@opensuse.org
A follow-up: At someone's suggestion, I installed gcc/gfortran 4.3 from Factory. I still got a segfault as before. I did this on a machine that still had an executable of my application from SuSE 10.2. It ran fine. But to recompile the application, I also had to rebuild the libraries (with no changes). That led to the failure. My applications are built 'static.' Joe Comfort --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-programming+help@opensuse.org
Here is some explicit information about the problems I am having. My very scaled-down main.F program calls 3 user routines from the Cernlib library (which call other subroutines). If I do not include the third routine in the compilation, the execution is 'good' with no errors. If I do include the third routine, I get a segfault near the beginning of execution. The output from `valgrind -v' is: ==27174== Memcheck, a memory error detector. ==27174== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al. ==27174== Using LibVEX rev 1732, a library for dynamic binary translation. ==27174== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. ==27174== Using valgrind-3.2.3, a dynamic binary instrumentation framework. ==27174== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. ==27174== --27174-- Command line --27174-- ./cball --27174-- Startup, with flags: --27174-- -v --27174-- Contents of /proc/version: --27174-- Linux version 2.6.22.17-0.1-default (geeko@buildhost) (gcc version 4.2.1 (SUSE Linux)) #1 SMP 2008/02/10 20:01:04 UTC --27174-- Arch and hwcaps: AMD64, amd64-sse2 --27174-- Page sizes: currently 4096, max supported 4096 --27174-- Valgrind library directory: /usr/lib64/valgrind --27174-- Reading syms from /usr/local/cern/2005/cball/cball (0x400000) --27174-- object doesn't have a dynamic symbol table ********* warning: DiCfSI 0x172004CB810 .. 0x172004CB93C outside segment 0x400000 .. 0x61DFFF ********* warning: DiCfSI 0x172004CB93D .. 0x172004CB93E outside segment 0x400000 .. 0x61DFFF ********* warning: DiCfSI 0x172004CB93F .. 0x172004CB93F outside segment 0x400000 .. 0x61DFFF --27174-- Reading syms from /usr/lib64/valgrind/amd64-linux/memcheck (0x38000000) --27174-- object doesn't have a symbol table --27174-- object doesn't have a dynamic symbol table --27174-- Reading suppressions file: /usr/lib64/valgrind/default.supp ********* Syscall param set_robust_list(head) points to uninitialised byte(s) ********* at 0x4CD093: __pthread_initialize_minimal (in /usr/local/cern/2005/cball/cball) ********* by 0x4F1CE7: (below main) (in /usr/local/cern/2005/cball/cball) ********* Address 0x4000960 is not stack'd, malloc'd or (recently) free'd ********* ==27174== Conditional jump or move depends on uninitialised value(s) ==27174== at 0x51211F: strlen (in /usr/local/cern/2005/cball/cball) ==27174== by 0x526314: fillin_rpath (in /usr/local/cern/2005/cball/cball) ==27174== by 0x5281CC: _dl_init_paths (in /usr/local/cern/2005/cball/cball) ==27174== by 0x52D8E2: _dl_non_dynamic_init (in /usr/local/cern/2005/cball/cball) ==27174== by 0x52E83A: __libc_init_first (in /usr/local/cern/2005/cball/cball) ==27174== by 0x4F1D3A: (below main) (in /usr/local/cern/2005/cball/cball) ==27174== ********* Jump to the invalid address stated on the next line ********* Address 0x0 is not stack'd, malloc'd or (recently) free'd ********* ********* Process terminating with default action of signal 11 (SIGSEGV): dumping core ********* Bad permissions for mapped region at address 0x0 ********* at 0x0: ??? ********* ********* ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0) ********* The output continues with repetition of the 3 errors, and a summary. The lines marked by ********* do not appear in a compilation that does not include the subroutine call that seems to be causing the problem. Instead, on the 'good' output, I get the following items (plus 4 more as well as output from the first of my 3 subroutine calls): ==27152== Conditional jump or move depends on uninitialised value(s) ==27152== at 0x42E2D7: __ctype_toupper_loc (in /usr/local/cern/2005/cball/cball) ==27152== by 0x425E52: next_char (in /usr/local/cern/2005/cball/cball) ==27152== by 0x425EEE: format_lex (in /usr/local/cern/2005/cball/cball) ==27152== by 0x426D16: _gfortrani_parse_format (in /usr/local/cern/2005/cball/cball) ==27152== by 0x41D5EF: data_transfer_init (in /usr/local/cern/2005/cball/cball) ==27152== by 0x401180: mzebra_ (in /usr/local/cern/2005/cball/cball) ==27152== by 0x400454: gzebra_ (in /usr/local/cern/2005/cball/cball) ==27152== by 0x400373: MAIN__ (main.F:34) ==27152== by 0x418A2B: main (in /usr/local/cern/2005/cball/cball) ==27152== ==27152== Conditional jump or move depends on uninitialised value(s) ==27152== at 0x42E317: __ctype_b_loc (in /usr/local/cern/2005/cball/cball) ==27152== by 0x426065: format_lex (in /usr/local/cern/2005/cball/cball) ==27152== by 0x4266A0: parse_format_list (in /usr/local/cern/2005/cball/cball) ==27152== by 0x426D47: _gfortrani_parse_format (in /usr/local/cern/2005/cball/cball) ==27152== by 0x41D5EF: data_transfer_init (in /usr/local/cern/2005/cball/cball) ==27152== by 0x401180: mzebra_ (in /usr/local/cern/2005/cball/cball) ==27152== by 0x400454: gzebra_ (in /usr/local/cern/2005/cball/cball) ==27152== by 0x400373: MAIN__ (main.F:34) ==27152== by 0x418A2B: main (in /usr/local/cern/2005/cball/cball) ==27152== ETC. Similar valgrind messages appear on valgrind outputs from code generated on SuSE 10.2 systems, where the application runs fine. Hence, if my third subroutine call is compiled and linked into the code, I get a segfault even before my first two calls are executed! This is true even if I insert a test and branch that skips execution of the third subroutine call. I remind that this problem did not exist in SuSE 10.2 or earlier, and also does not exist if I execute code generated under 10.2 on my 10.3 system. There are some warning messages generated during the linking stage when the third routine is included in the code. (They are not generated when it is not included.) Because I also have the shared libraries, and because I have seen them on the 10.2 system without any known failures, and because they are in the system libraries and not my code, I have ignored them. But maybe they are indicate some kind of a problem. (My application code will not use the features, although they are apparently part of the subroutines that are being linked it.) The compilation and linking output is: Building Crystal Ball simulation program ... gfortran -Wall -fno-second-underscore -fno-automatic -g -static -Xlinker -Map -Xlinker cball.map -Xlinker -cref -o cball main.F -L/usr/local/cern/pro/lib -lgeant321 -lgcalor -lgraflib -lgrafX11 -lpacklib -lpawlib -lkernlib -lmathlib -llapack3 -lblas -L/usr/X11R6/lib64 -lX11 -lxcb -lxcb-xlib -lXau -L/usr/lib64 -ldl -lpthread /usr/lib64/libX11.a(CrGlCur.o): In function `open_library': (.text+0x25): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking /usr/lib64/libxcb.a(xcb_util.o): In function `_xcb_open': (.text+0x334): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking /usr/lib64/libX11.a(imLcIm.o): In function `_XimLocalOpenIM': (.text+0x11f3): warning: memset used with constant zero length parameter; this could be due to transposed parameters /usr/lib64/libX11.a(xim_trans.o): In function `_XimXTransSocketUNIXConnect': (.text+0x14cc): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking /usr/lib64/libX11.a(xim_trans.o): In function `_XimXTransSocketINETConnect': (.text+0x1d82): warning: Using 'getservbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking So my question is: what changed between SuSE 10.2 and 10.3 that could cause this problem? I would appreciate any ideas as to how to proceed from here. Thank you. -- Joseph Comfort Phone: (480)-965-6377 Physics Department Dept.: (480)-965-3561 Arizona State University Fax: (480)-965-7954 Tempe, AZ 85287-1504 Email: Joseph.Comfort@asu.edu --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-programming+help@opensuse.org
participants (1)
-
Joseph Comfort