[Bug 390722] New: gdb - backtrace grace ?
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c339730 Summary: gdb - backtrace grace ? Product: openSUSE 11.0 Version: Beta 2 Platform: Other OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: Development AssignedTo: pth@novell.com ReportedBy: mmeeks@novell.com QAContact: qa@suse.de CC: hpj@novell.com, lpechacek@novell.com Found By: --- So - gdb is working less well than I remember it working in the past, and less well than it could be hoped I think :-) This is really an analog of #339730# to which I will also attach my test case. Suffice it to say that most apps that I connect gdb to fail to give a useful stack trace, unless I have full debuginfo installed for all the packages working from the bottom to the top of the stack. As a banal example here is console kit: (gdb) bt #0 0xffffe430 in __kernel_vsyscall () #1 0xb7e163f9 in ioctl () from /lib/libc.so.6 #2 0x0805b810 in ?? () #3 0x0000000a in ?? () #4 0x00005607 in ?? () #5 0x00000030 in ?? () #6 0x00000030 in ?? () #7 0xb7f71770 in g__g_thread_lock () from /usr/lib/libglib-2.0.so.0 #8 0xb7f90ff4 in ?? () from /lib/libpthread.so.0 #9 0xb7f71770 in g__g_thread_lock () from /usr/lib/libglib-2.0.so.0 #10 0x08062444 in ?? () #11 0x08062420 in ?? () #12 0x08062513 in ?? () #13 0xb7a50b58 in ?? () #14 0xb7f70ff4 in ?? () from /usr/lib/libglib-2.0.so.0 now I install ConsoleKit-debuginfo and try again: [Switching to thread 20 (Thread 0xb7a50b90 (LWP 2253))]#0 0xffffe430 in __kernel_vsyscall () (gdb) bt #0 0xffffe430 in __kernel_vsyscall () #1 0xb7e163f9 in ioctl () from /lib/libc.so.6 #2 0x0805b810 in ck_wait_for_active_console_num (console_fd=10, num=48) at ck-sysdeps-unix.c:266 #3 0x080518a2 in vt_thread_start (data=0x8076918) at ck-vt-monitor.c:322 #4 0xb7f2039f in g_thread_create_proxy (data=0x8076928) at gthread.c:635 #5 0xb7f82175 in start_thread (arg=0xb7a50b90) at pthread_create.c:297 #6 0xb7e1ddde in clone () from /lib/libc.so.6 Same running process - same system, no other debuginfo installed [ NB. I have debuginfo for libc & gthread) so ... Anyhow - I created a more minimal test case, which I'll attach. It would be -extremely- useful wrt. debugging things to have a fix here :-) [ also for SLED ] -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c1 --- Comment #1 from Michael Meeks <mmeeks@novell.com> 2008-05-15 05:35:45 MST --- Created an attachment (id=215511) --> (https://bugzilla.novell.com/attachment.cgi?id=215511) gdb test-case: run 'make' I run make here and see: (gdb) bt #0 0xffffe430 in __kernel_vsyscall () #1 0x400db0f0 in __nanosleep_nocancel () from /lib/libc.so.6 #2 0x400daefe in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138 #3 0x0804858b in ?? () #4 0x00000000 in ?? () I would expect gdb to be able to wander back up the stack frames giving at least vaguely intelligent information for each frame - in particular for trace_two - which should have good debuginfo. Note - if I remove the 'strip ./a.out' from the Makefile I get: (gdb) bt #0 0xffffe430 in __kernel_vsyscall () #1 0x400db0f0 in __nanosleep_nocancel () from /lib/libc.so.6 #2 0x400daefe in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138 #3 0x0804858b in trace_zero () #4 0x080485a5 in trace_one () #5 0x4001e4ac in trace_two (fn=0x804858d <trace_one>) at two.c:8 #6 0x080485b9 in trace_three () #7 0x080485d1 in main () which is much more like what I would expect to get . HTH. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 Philipp Thomas <pth@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pth@novell.com AssignedTo|pth@novell.com |schwab@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User schwab@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c2 Andreas Schwab <schwab@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|Major |Normal --- Comment #2 from Andreas Schwab <schwab@novell.com> 2008-05-16 04:00:33 MST --- If you strip you don't have any debuginfo. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c3 --- Comment #3 from Michael Meeks <mmeeks@novell.com> 2008-05-16 04:08:14 MST --- Naturally - however apparently we ship our packages with stripped libraries in them: and this causes gdb to become -far- less useful wrt. debugging almost anything: worse it's a regression - this used to work quite well. Surely there is no conceptual reason why we can't walk back up the un-optimised [!] frame-pointer containing [!] code and give at least useful output where we have debuginfo; I would accept: #0 0xffffe430 in __kernel_vsyscall () #1 0x400db0f0 in __nanosleep_nocancel () from /lib/libc.so.6 #2 0x400daefe in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138 #3 0x0804858b in ?? #4 0x080485a5 in ?? #5 0x4001e4ac in trace_two (fn=0x804858d <trace_one>) at two.c:8 #6 0x080485b9 in ?? #7 0x080485d1 in main () but not giving up as we do at frame #4. Really - having a system that is extremely hard to debug, and requires the installation of tons of mostly redundant debuginfo packages makes life rather harder than it should be [ not to mention the horrible truncation of the trace ] Can you re-consider the priority change ? the evo. guys complain like mad about this on SLED10 - it makes their lives hell wrt. debugging. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User schwab@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c4 Andreas Schwab <schwab@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID --- Comment #4 from Andreas Schwab <schwab@novell.com> 2008-05-16 06:15:57 MST --- Since there is no symbol at PC there is no way to find the beginning of the function for prologue analysis. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c5 Michael Meeks <mmeeks@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |psankar@novell.com Status|RESOLVED |REOPENED Resolution|INVALID | --- Comment #5 from Michael Meeks <mmeeks@novell.com> 2008-05-16 07:29:59 MST --- So - let me get this right; are you stating that: * in order to generate a useful backtrace - in future it is always going to be necessary to install debuginfo packages from the bottom of the trace upwards ? That would suck some big rocks and was historically not the case. Also - it would be interesting to know how valgrind [ if you replace the sleep() with (*(int*)0) = 0; ] manages to generate a useful debug log: ==21438== Process terminating with default action of signal 11 (SIGSEGV) ==21438== Access not within mapped region at address 0x0 ==21438== at 0x8048584: (within /home/michael/gdb-testcase/a.out) ==21438== by 0x80485AF: (within /home/michael/gdb-testcase/a.out) ==21438== by 0x40294AB: trace_two (two.c:8) ==21438== by 0x80485C3: (within /home/michael/gdb-testcase/a.out) ==21438== by 0x80485DB: (within /home/michael/gdb-testcase/a.out) ==21438== by 0x40655F5: (below main) (libc-start.c:220) ie. much as my suggested trace above. how does valgrind do it ? [ cf. coregrind/m_stacktrace.c (get_StackTrace_wrk) ] - it certainly doesn't give up almost immediately and actually gets to 'main' :-) Why is it that we cannot simply follow the %ebp chain up and get a load of values for IPs for each function call point & make some educated guess (as valgrind does) ? I've tentatively re-opened - if I'm just being totally dim witted here :-) please do re-close, but I would really like to better understand the necessity of prologue analysis when the code is compiled on IA32 with -O0 and no sillies (eg. -fomit-frame-pointer) - valgrind's backtrace appears rather simpler & more robust. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User schwab@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c6 Andreas Schwab <schwab@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |INVALID --- Comment #6 from Andreas Schwab <schwab@novell.com> 2008-05-16 07:35:28 MST --- A debugger absolutely needs more that just a frame chain. There is no way this is going to work with all that missing information. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c7 --- Comment #7 from Michael Meeks <mmeeks@novell.com> 2008-05-16 07:53:07 MST --- So - to try to expand and elaborate when you write the above what I hear is: "yes, we could provide a much more useful stack trace to the poor person trying to debug something - but we're not going to: because we want to show something else much less useful" ;-) or do I mistake things ? ;-) valgrind can provide a useful trace for a poor hacker trying to find what went wrong and where; but gdb is committed to not do so ? just for reference since we have the valgrind trace above - lets see what gcc does when we get the same segv trapped: Program received signal SIGSEGV, Segmentation fault. 0x08048584 in ?? () (gdb) bt #0 0x08048584 in ?? () #1 0x080486b0 in _IO_stdin_used () #2 0x00000001 in ?? () #3 0x00000025 in ?? () #4 0xb7fec560 in ?? () from /lib/libc.so.6 #5 0x00000027 in ?? () #6 0xb7eac6c0 in ?? () #7 0xbf930898 in ?? () #8 0x080485b0 in ?? () #9 0x00000001 in ?? () #10 0x00000003 in ?? () #11 0xbf9308b8 in ?? () #12 0xb80134ac in trace_two (fn=0x1) at two.c:8 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Are you suggesting that this is a more useful view than: ==21438== at 0x8048584: (within /home/michael/gdb-testcase/a.out) ==21438== by 0x80485AF: (within /home/michael/gdb-testcase/a.out) ==21438== by 0x40294AB: trace_two (two.c:8) ==21438== by 0x80485C3: (within /home/michael/gdb-testcase/a.out) ==21438== by 0x80485DB: (within /home/michael/gdb-testcase/a.out) ==21438== by 0x40655F5: (below main) (libc-start.c:220) [ though I guess, due to some fluke of parameters passed (no 0's) we managed at least to get nice line number information for trace_two ]. If this more useful, what is it ? :-) [ I assume it's just an unwind of the stack, frame by frame printed in hex with some guesses as to function addresses - until we hit a NULL - but how useful is that really honestly ? vs. being able to tell what was called from where ]. Or - are you suggesting that we shouldn't strip any of our binaries as we ship them - so we can can get stack traces when things fail ? or ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User federico@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c8 Federico Mena Quintero <federico@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |federico@novell.com Severity|Normal |Major Status|RESOLVED |REOPENED Resolution|INVALID | --- Comment #8 from Federico Mena Quintero <federico@novell.com> 2008-05-16 13:21:42 MST --- I agree with Michael. Not being able to get at least an initial clue of where a program is crashing is *massively* inconvenient. Bug reports then become useless, as everyone files stack traces that are full of NULLs. Then you must play Bugzilla ping-pong to get any useful information from users. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c9 --- Comment #9 from Michael Meeks <mmeeks@novell.com> 2008-05-16 13:33:35 MST --- So - how about some hybrid - I assume the gdb developers think people want to see a raw dump of the stack, looking like stack frames; but why not interleave that - we may not have debug data - but we can guess which are the IP frames by following the %ebp chain; so something like this could be possible (?) from 0x08048584: (within /home/michael/gdb-testcase/a.out) 0x080486b0 0x00000001 0x00000025 0xb7fec560 0x00000027 0xb7eac6c0 0xbf930898 from 0x080485AF: (within /home/michael/gdb-testcase/a.out) 0x080485b0 0x00000001 0x00000003 0xbf9308b8 from 0x040294AB: trace_two (two.c:8) from 0x080485C3: (within /home/michael/gdb-testcase/a.out) .. ==21438== by 0x40655F5: (below main) (libc-start.c:220) or something ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User schwab@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c10 Andreas Schwab <schwab@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |INVALID --- Comment #10 from Andreas Schwab <schwab@novell.com> 2008-05-19 02:58:58 MST --- Not a bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User matz@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c11 Michael Matz <matz@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |matz@novell.com, stefan.fent@novell.com Status|RESOLVED |REOPENED Resolution|INVALID | --- Comment #11 from Michael Matz <matz@novell.com> 2008-05-19 07:58:58 MST --- Of course it's a bug, at least a quality of implementation one. If valgrind can do it, so can gdb. Even more so, because gdb _did_ do this in the past. It certainly is a regression in gdbs stack walker, and you should try to find out what it caused. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User schwab@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c12 Andreas Schwab <schwab@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |INVALID --- Comment #12 from Andreas Schwab <schwab@novell.com> 2008-05-19 08:01:22 MST --- It not going to work without that missing information. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User matz@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c13 Michael Matz <matz@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | --- Comment #13 from Michael Matz <matz@novell.com> 2008-05-19 08:11:52 MST --- I assume you haven't even looked at the issue. valgrind clearly _can_ provide a more usefull backtrace, I'm not sure why you're ignoring this fact. Hence a %ebp chain is in place, which gdb can use to skip frames where no debug info exists. Currently gdb is so heavily confused if even _one_ intermediate frame has no debug info, that even higher frames that _do_ have debug info are not parsed anymore. That's the bug, gdb clearly can do exactly the same as valgrind and use %ebp to skip the frame. That would be much better than what it currently does. gdb clearly tries to do something: it walks the stack somewhat, it just is heavily confused by the addresses. If it absolutely would have to give up, then gdb should just say so, instead of running wild in memory. And the claim is, that gdb did this some time ago. At least earlier versions were more usefull even in absence of debug information, that's nothing you can discuss away by claiming "it's impossible". It is possible, and this bug report is a request to make it happen again. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User schwab@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c14 Andreas Schwab <schwab@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |INVALID --- Comment #14 from Andreas Schwab <schwab@novell.com> 2008-05-19 08:38:02 MST --- It not going to work without that missing information. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User matz@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c15 Michael Matz <matz@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mistinie@novell.com Status|RESOLVED |REOPENED Resolution|INVALID | --- Comment #15 from Michael Matz <matz@novell.com> 2008-05-19 08:49:47 MST --- valgrind can do it, hence can gdb. If you think it can't, explain why. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User schwab@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c16 Andreas Schwab <schwab@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |INVALID --- Comment #16 from Andreas Schwab <schwab@novell.com> 2008-05-19 08:54:19 MST --- It's not going to work without that missing information. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User matz@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c17 --- Comment #17 from Michael Matz <matz@novell.com> 2008-05-19 09:14:16 MST --- valgrind can do it, hence can gdb. If you think it can't, explain why. Or better, try to fix gdb. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User schwab@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c19 Andreas Schwab <schwab@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |INVALID --- Comment #19 from Andreas Schwab <schwab@novell.com> 2008-05-19 09:52:03 MST --- It's not going to work without that missing information. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User lpechacek@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c20 Libor Pechacek <lpechacek@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | --- Comment #20 from Libor Pechacek <lpechacek@novell.com> 2008-05-20 00:17:07 MST --- Andreas, I don't think this bug should be closed as invalid. In principle, the presence of the frame pointer enables the debugging tool to walk the call chain up to main() without problems. Moreover another debugging tool can do it. Why GDB can't? Please, be so kind as to elaborate, or at least don't close this bug as invalid. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c22 --- Comment #22 from Michael Meeks <mmeeks@novell.com> 2008-05-20 02:11:05 MST --- As another data-point for "It's not going to work without that missing information", here is gdb doing really rather well having jumped a load of mangled stack frames to a frame where there is debuginfo: (gdb) bt #0 0x08048584 in ?? () #1 0x080486b0 in _IO_stdin_used () #2 0x00000001 in ?? () #3 0x00000025 in ?? () #4 0x40183560 in ?? () from /lib/libc.so.6 #5 0x00000027 in ?? () #6 0x401876c0 in ?? () #7 0xbff1ee18 in ?? () #8 0x080485b0 in ?? () #9 0x00000001 in ?? () #10 0x00000003 in ?? () #11 0xbff1ee38 in ?? () #12 0x4001e4ac in trace_two (fn=0x1) at two.c:8 Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) up 12 #12 0x4001e4ac in trace_two (fn=0x1) at two.c:8 8 fn (1, 3); (gdb) l 3 extern int trace_one (int a, unsigned int b); 4 5 void trace_two (int (*fn) (int, unsigned int)) 6 { 7 fprintf (stderr, "this method should have nice debuginfo\n"); 8 fn (1, 3); 9 } (gdb) Clearly examining (non-)frame #8 doesn't look so good: #8 0x080485b0 in ?? () (gdb) l Line number 10 out of range; two.c has 9 lines. but - that's expected of course. So: it seems that far from being hopeless without the missing information: if we can move into a valid frame higher up the stack that is associated with some debuginfo (and which can easily be determined from %ebp chaining): then we can in fact do some really rather useful debugging :-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c23 --- Comment #23 from Michael Meeks <mmeeks@novell.com> 2008-05-20 06:05:22 MST --- So - my interest piqued; I installed SuSE Linux 10.0 to play with: interestingly it is also worse than valgrind - but at least doesn't artificially truncate the stack trace as soon as it feels threatened by a NULL ;-) Program received signal SIGSEGV, Segmentation fault. 0x0804854a in ?? () (gdb) bt #0 0x0804854a in ?? () #1 0xbffa6a78 in ?? () #2 0x40080df2 in fwrite () from /lib/tls/libc.so.6 #3 0x08048576 in ?? () #4 0x00000001 in ?? () #5 0x00000003 in ?? () #6 0x4014a6c0 in ?? () #7 0x400196ec in ?? () from ./libtwo.so #8 0x00000000 in ?? () #9 0x40015cc0 in ?? () from /lib/ld-linux.so.2 #10 0xbffa6a98 in ?? () #11 0x40018577 in trace_two (fn=0x1) at two.c:8 #12 0x40018577 in trace_two (fn=0x8048562 <_init+370>) at two.c:8 #13 0x0804858e in ?? () #14 0x08048562 in ?? () #15 0x080497f8 in ?? () #16 0xbffa6ab8 in ?? () #17 0x08048405 in _init () #18 0x080485b4 in ?? () #19 0x40145ff4 in ?? () from /lib/tls/libc.so.6 #20 0x08048630 in ?? () #21 0x00000000 in ?? () #22 0x40145ff4 in ?? () from /lib/tls/libc.so.6 #23 0x00000000 in ?? () #24 0x40015cc0 in ?? () from /lib/ld-linux.so.2 #25 0xbffa6b28 in ?? () #26 0x4003fea0 in __libc_start_main () from /lib/tls/libc.so.6 #27 0x4003fea0 in __libc_start_main () from /lib/tls/libc.so.6 #28 0x08048491 in ?? () Of course - the ideal trace would be a lot more like the valgrind trace ;-) but not giving up prematurely would be much appreciated too. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c24 --- Comment #24 from Michael Meeks <mmeeks@novell.com> 2008-05-22 10:34:57 MST --- I managed to find some a friendly gdb hacker (or two) - who gave some helpful advice: <mjw> -fomit-frame-pointer might be standard these days <daney> Put a breakpoint on all 'call' and 'ret' instructions and record $sp each time you encounter one... <mjw> Ah, it was the section above that: http://sourceware.org/gdb/current/onlinedocs/gdbint_3.html#SEC8 <mjw> " 3.1 Prologue Analysis " Clever stuff :) <mjw> Anyway, I think fedora got this right. Just compile everything with -fasynchronous-unwind-tables then you don't need any heuristics. <mjw> Although I think daney's solution is also pretty slick :) Now, of course I'm clueless: do we compile currently with -fasynchronous-unwind-tables ? <aph_> mjw: ah, but we don't emit unwinder data for the prologue, that's the reason gdb can't just use the unwinder data, now I remember <mjw> Don't settle for -fexceptions! Go for the full monty! -fasynchronous-unwind-tables backtrace from any place! <aph_> mjw: even with -fasynchronous-unwind-tables we still aren't accurate in the prologue <mjw> aph_, I thought that was fixed? <aph_> I thought it was deliberate to save a lot of space <daney> Many distributions don't have any unwinding data for libraries written in C. This is the 21st. century. We have memory to burn now... <aph_> I can't imagine why anyone would want to fix it <mjw> because otherwise unwinding doesn't work reliably <mjw> I did work on this for frysk. I thought we got it all right. <aph_> it does, because you never uinwind in a prologue <mjw> I have to get my testcases, but I am pretty sure we have tests for stepping into and out a whole function. <aph_> but you use the debuginfo, don't you? * mjw was actually pretty proud that worked <daney> Really if you are using gdb, you should have accurate .debug_frame data for everything. leave .eh_frame for the runtime unwinder. <aph_> daney: yeah, exactly <mjw> aph_, in order eh_frame, debug_frame or heuristics based on peeking at the last few words on the stack and see if that was an address that contained a call instruction. <daney> Really everyone should be using a processor with fixed size instructions (like mips) where it is possible to do accurate unwinding with almost no meta data. <mjw> yeah, instruction decoding on x86 is so not fun! etc. etc. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User matz@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c25 --- Comment #25 from Michael Matz <matz@novell.com> 2008-05-23 03:11:22 MST --- Created an attachment (id=217718) --> (https://bugzilla.novell.com/attachment.cgi?id=217718) make gdb print %ebp based backtraces The snippet from the conversation mostly doesn't apply in our cases. First, due to stripping there is no .debug_frame section anymore, and without exceptions of course also no .eh_frame (on i386 at least). Which also means that we can't find the borders of the function a frame is associated with, and hence can't do any prologue analysis on these intermediate frames. (The speculation about -fomit-frame-pointer doesn't apply either, because we don't compile with that option on i386, otherwise also valgrind would be lost) What needs to happen is simply that the fallback unwinder (that gdb correctly uses if no dwarf debug info exists for the frame at hand) has to cope with this situation that it doesn't find function borders. It partly already assumes that it then is a normal %ebp frame, and if it already assumes so partly, we can also make use of it. The patch in this attachment does that. It's only for frames where no debuginfo exists, so it's a strict improvement to the current situation. The only cases where it could break (but not worse than before) if such frame happens to be without a frame pointer, i.e. compiled with -fomit-frame-pointer. We are lost then, for such frames we have no choice than to use debug info. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c26 --- Comment #26 from Michael Meeks <mmeeks@novell.com> 2008-05-23 04:08:45 MST --- Packages are available for testing here: http://www.go-oo.org/~michael/fixed-gdb/ I would heartily recommend them. Here is my test experience with OpenOffice.org [ a package with a small 275Mb debuginfo RPM ]: before patch: (gdb) bt #0 0xffffe430 in __kernel_vsyscall () #1 0xb6eb21d7 in *__GI___poll (fds=0x8641990, nfds=6, timeout=599) at ./sysdeps/unix/sysv/linux/poll.c:87 #2 0xb554f6f2 in ?? () from /usr/lib/libglib-2.0.so.0 #3 0x08641990 in ?? () #4 0x00000006 in ?? () #5 0x00000257 in ?? () #6 0x08641990 in ?? () #7 0x00000006 in ?? () #8 0x00000000 in ?? () after patch: (gdb) bt #0 0xffffe430 in __kernel_vsyscall () #1 0xb6eb21d7 in *__GI___poll (fds=0x8641990, nfds=6, timeout=599) at ./sysdeps/unix/sysv/linux/poll.c:87 #2 0xb554f6f2 in ?? () from /usr/lib/libglib-2.0.so.0 #3 0xb554f9d8 in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0 #4 0xb5b3d68d in ?? () from /usr/lib/ooo-2.0/program/libvclplug_gtk680li.so #5 0xb54f4751 in X11SalInstance::Yield () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #6 0xb7e06cf1 in Application::Yield () from /usr/lib/ooo-2.0/program/libvcl680li.so #7 0xb7e06d3f in Application::Execute () from /usr/lib/ooo-2.0/program/libvcl680li.so #8 0x08071c53 in desktop::Desktop::Main () #9 0xb7e0a27e in ?? () from /usr/lib/ooo-2.0/program/libvcl680li.so #10 0xb7e0a41a in SVMain () from /usr/lib/ooo-2.0/program/libvcl680li.so #11 0x08066c60 in main () apparently there is a way this will work. Andreas - can we get this patch into OpenSUSE 11.0 ? - it would *really* help improve the quality of the product going forward I think. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User matz@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c27 --- Comment #27 from Michael Matz <matz@novell.com> 2008-05-23 04:14:33 MST --- Created an attachment (id=217741) --> (https://bugzilla.novell.com/attachment.cgi?id=217741) same patch, for gdb-6.8 (as in 11.0) Equivalent patch for the gdb in 11.0. mbuild job is knuth-matz-2, I'll submit this for 11.0, coolo can take it or leave it :) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User hpj@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c28 --- Comment #28 from Hans Petter Jansson <hpj@novell.com> 2008-05-23 09:38:54 MDT --- Mr. Matz, I worship the ground you walk on. Thank you for this fix. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User federico@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c29 --- Comment #29 from Federico Mena Quintero <federico@novell.com> 2008-05-23 11:54:18 MDT --- W00t. Add me to the list of Matz-worshippers. This is fantastic to have. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mwelinder@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c30 Morten Welinder <mwelinder@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mwelinder@gmail.com --- Comment #30 from Morten Welinder <mwelinder@gmail.com> 2008-05-23 17:50:53 MDT --- Upstream asap, please! This patch needs to make it into all distros so we can forget there once was a lot of "??" traces. This is in the same league as the invention of hot water. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User Janne.Karhunen@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c31 Janne Karhunen <Janne.Karhunen@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |Janne.Karhunen@gmail.com --- Comment #31 from Janne Karhunen <Janne.Karhunen@gmail.com> 2008-05-26 16:24:03 MDT ---
(The speculation about -fomit-frame-pointer doesn't apply either, because we don't compile with that option on i386, otherwise also valgrind would be lost)
A while back I experimented with libsegfault by adding it threading support and some other fancy stuff such as 'file:line' DWARF data extract. Once testing that prior to sending it out to libc-alpha I found the frame pointer trail to be highly unreliable even when 'omit-frame-pointer' is not being used. Heck, even libc sleep() seemed to be using ebp/rbp for something else destroying the trail :/ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User matz@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c32 --- Comment #32 from Michael Matz <matz@novell.com> 2008-05-27 07:11:39 MDT --- $ebp based frames are only highly unpredictable when they don't exist. Unfortunately you can't detect this with certainty (i.e. if $ebp points to the frame or is used for something else), that's where the problem is. In normal compiled code (libc with its load of inline asm doesn't count as that for some functions unfortunately) you can be sure that $ebp points to a frame (if not compiled with omit-frame-pointer of course). So, under the assumptions that we want this whole thing to get useful backtraces out of segfaults, and that further such segfaults in libc are not happening very often, but that they rather occur in application code (without being called back from libc code, like with qsort), it seems sensible to just rely on $ebp frames, even if it also has its share of problems. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User Janne.Karhunen@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c33 --- Comment #33 from Janne Karhunen <Janne.Karhunen@gmail.com> 2008-05-27 08:23:43 MDT --- Right, the reason libc is breaking frame pointer trail so heavily probably comes from excessive inline asm usage. These days quite a few libc functions are really inline asm wrappers to direct system calls (socket functions are great example of this) :/ Anyhoo, my conclusion of libsegfault was that it should probably be removed from libc completely (or that it should be rewritten using libunwind), it's that unreliable. So may the force be with you here.. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User aj@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c41 Andreas Jaeger <aj@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |REOPENED Info Provider|matz@novell.com | --- Comment #41 from Andreas Jaeger <aj@novell.com> 2008-10-24 05:20:47 MDT --- Michael, Sankar, what is the status here? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User Janne.Karhunen@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c42 --- Comment #42 from Janne Karhunen <Janne.Karhunen@gmail.com> 2008-10-24 05:47:41 MDT --- FYI: there is a nice heuristic in libunwind for detecting valid frame pointer (based on size of the jump). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User mmeeks@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c43 --- Comment #43 from Michael Meeks <mmeeks@novell.com> 2008-10-24 07:02:23 MDT --- Micha's patch is wonderful - IMHO we should ship that ASAP; then tackle Sankar's other problems separately as new bug(s). Oh - and we should make gdb "actually work"(TM) in OS11.1 - so we have even a small chance of debugging things :-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User matz@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c44 --- Comment #44 from Michael Matz <matz@novell.com> 2008-10-24 07:34:50 MDT --- Created an attachment (id=247826) --> (https://bugzilla.novell.com/attachment.cgi?id=247826) ported patch for sles10-sp2 gdb FWIW here's the patch backported to the SLES10 gdb. It's really trivial, so I guess you had a similar one already. If that doesn't work we need a better testcase, the one from Michael Meeks is fixed as far as I know. mbuild is knuth-matz-23 . -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User matz@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c45 --- Comment #45 from Michael Matz <matz@novell.com> 2008-10-24 07:38:54 MDT --- The 11.1 gdb still contains this patch, so we should be fine. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 Swamp Script User <swamp@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status Whiteboard| |maint:planned:sle10-sp3 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 User psankar@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=390722#c46 --- Comment #46 from Sankar P <psankar@novell.com> 2009-01-07 02:33:40 MST --- I just verified in 11.1 and it works fine -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=390722 Philipp Thomas <pth@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|pth@novell.com |matz@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com