[Bug 289641] New: lock-up on Qt app exit from g++ miscompiling pthread cleanup code
https://bugzilla.novell.com/show_bug.cgi?id=289641 Summary: lock-up on Qt app exit from g++ miscompiling pthread cleanup code Product: openSUSE 10.2 Version: Final Platform: i686 OS/Version: openSUSE 10.2 Status: NEW Severity: Normal Priority: P5 - None Component: Development AssignedTo: pth@novell.com ReportedBy: ahanssen@trolltech.com QAContact: qa@suse.de Found By: Other The program from the source code below locks up if compiled with gcc or g++ with exceptions disabled, but locks up when compiled with g++ with exception support enabled. If you look in pthread.h, you'll see the cleanup handler code is different when compiling for C or C++. Effect: All Qt 4 applications can lock up on exit. Why: Qt's QThread class uses pthread's cleanup handlers to get notifications for when threads die on Unix. Because of what seems to be a gcc bug, this cleanup handler is not called on OpenSuSE 10.2. We have tested the same compiler version on other distributions, and it doesn't lock up there. Also, Intel's ICC does not cause this lockup. Compiler version: gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux) To compile as C: gcc main.c -lpthread as C++: g++ main.c -lpthread as C++: g++ main.c -lpthread -fno-exceptions ---8<--- #include <pthread.h> #include <stdio.h> #include <unistd.h> pthread_cond_t cond = PTHREAD_COND_INITIALIZER; void finish(void *arg) { pthread_cond_signal(&cond); fprintf(stderr, "finish()\n"); } void *cancelthread(void *arg) { pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL); pthread_cleanup_push(finish, NULL); fprintf(stderr, "thread sleeping...\n"); pthread_testcancel(); sleep(300); pthread_cleanup_pop(1); return 0; } int main() { pthread_attr_t attr; pthread_t thread2; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); pthread_create(&thread2, &attr, cancelthread, NULL); sleep(1); fprintf(stderr, "request cancel\n"); pthread_cancel(thread2); fprintf(stderr, "wait\n"); pthread_cond_wait(&cond, &mutex); fprintf(stderr, "done\n"); } -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c1 --- Comment #1 from Andreas Aardal Hanssen <ahanssen@trolltech.com> 2007-07-05 00:39:23 MST --- (In reply to comment #0 from Andreas Aardal Hanssen)
The program from the source code below locks up if compiled with gcc or g++ with exceptions disabled, but locks up when compiled with g++ with exception support enabled. If you look in pthread.h, you'll see the cleanup handler
Sorry for the typo: The program does /not/ lock up with gcc or g++ without exceptions. ;) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 Cristian Rodriguez <judas_iscariote@shorewall.net> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|pth@novell.com |dmueller@novell.com Component|Development |KDE -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 Andreas Jaeger <aj@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dmueller@novell.com, matz@novell.com AssignedTo|dmueller@novell.com |rguenther@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c2 Richard Guenther <rguenther@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |ahanssen@trolltech.com --- Comment #2 from Richard Guenther <rguenther@novell.com> 2007-07-05 02:55:16 MST --- The testcase only compiles for me if I add a declaration for 'mutex'. It also works for me, that is, doesn't look up with or without -fno-exceptions. Can you specify the glibc and kernel versions you are using? Thanks, Richard. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c3 --- Comment #3 from Michael Matz <matz@novell.com> 2007-07-05 08:24:06 MST --- Can reproduce on i686 with glibc-2.5-25, kernel-default-2.6.18.2-34 . It works on i686, though. Hmm, I darkly remember a similar problem. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c4 Michael Matz <matz@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|rguenther@novell.com |matz@novell.com Status|NEEDINFO |ASSIGNED Info Provider|ahanssen@trolltech.com | --- Comment #4 from Michael Matz <matz@novell.com> 2007-07-05 08:50:27 MST --- Oh joy, it's connected with bug 182298 (which in the end was deferred), i.e. the general theme that C++ exception handling and thread cancellation (which you do in your testcase) doesn't go together well in their current state of affairs. The above bug report also leads to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28145 and http://www.codesourcery.com/archives/c++-pthreads/msg00571.html . It's a very ugly and hard problem to solve as the POSIX and C++ Specs are in conflict :-( I'm not sure why other distribution should not have this very problem, if they're using the same glibc and gcc. Probably it's time to really debug the problem some more as I hinted at in bug 182298 comment #30 onwards . -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 Stephan Kulow <coolo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|KDE |Commercial -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 Stephan Kulow <coolo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Commercial |Development -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c5 --- Comment #5 from Michael Matz <matz@novell.com> 2007-07-05 09:34:55 MST --- This btw. only happens with the i686 glibc, not with the i586 one, but that's probably just an artifact (but might explain why it could work on other distros). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c6 Michael Matz <matz@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|matz@novell.com |kernel-maintainers@forge.provo.novell.com Status|ASSIGNED |NEW Component|Development |Kernel Summary|lock-up on Qt app exit from |kernels vDSO mapping broken, leads to pthread |g++ miscompiling pthread |cancel problems. |cleanup code | --- Comment #6 from Michael Matz <matz@novell.com> 2007-07-05 12:05:55 MST --- It's a kernel bug. The problem is, that on i686 the vDSO vsyscall page is used. With address space randomization the vDSO itself won't be mapped to the default address 0xffffe000, but to some other random address. E.g.: b7f10000-b7f11000 r-xp b7f10000 00:00 0 [vdso] That in itself is not yet a problem, but the kernel notifies the process of the wrong address afterwards as can be seen here: % LD_SHOW_AUXV=1 cat /proc/self/maps | egrep 'SYSINFO|vdso' AT_SYSINFO: 0xb7eee400 AT_SYSINFO_EHDR: 0xffffe000 b7eee000-b7eef000 r-xp b7eee000 00:00 0 [vdso] So, the vdso is mapped to 0xb7eee000, and the AT_SYSINFO aux header correctly points inside that DSO. But the ELF Header pointer (AT_SYSINFO_EHDR) remains at 0xffffe000. Now, due to kernel magic the vdso is also mapped at the compat address. But for syscalls the randomized mapping is used (i.e. when following the backtrace you'll somewhen hit a program counter inside the randomized vdso, not inside the 0xffffe000 mapping). The result for all of this is, that dl_iterate_phdr won't find the vDSO because it iterates over the registered ELF headers. Hence if the program counter is inside that DSO (which happens when unwinding through a normal syscall, which is exactly what happens when a thread is canceled which itself is inside a syscall) it can't be associated with any unwind information (the vDSO contains unwind info for itself, but as the vDSO can't be found...). Hence unwinding stops at that point and simply exits the thread. Of course without proper unwinding or running cleanups. kernel-default-2.6.18.2-34 FWIW. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c7 --- Comment #7 from Jeff Mahoney <jeffm@novell.com> 2007-07-05 12:11:27 MST --- Looks like bug 258433 might be related as well. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 Dirk Mueller <dmueller@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|Normal |Major -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c9 --- Comment #9 from Marcus Meissner <meissner@novell.com> 2007-09-11 06:07:10 MST --- are you really using a SUSE provided kernel? because the SUSE kernel does not randomize the vDSO at this time. (sad enough) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c10 K.R. Foley <kr@cybsft.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kr@cybsft.com --- Comment #10 from K.R. Foley <kr@cybsft.com> 2007-09-12 20:53:56 MST --- This looks exactly like the bug I have been chasing for the past couple of days which is documented here http://sourceware.org/bugzilla/show_bug.cgi?id=4123 and showed up when we moved to opensuse 10.2 By the way I get the same behavior whether on 2.6.18.8-0.3-default or 2.6.19-rt6-default (which is a vanilla kernel with realtime patches and a custom patch or two applied). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c11 --- Comment #11 from Michael Matz <matz@novell.com> 2007-09-13 04:05:32 MST --- Re comment #9: yes, this was a SuSE kernel, as I wrote. In particular it was kernel-default-2.6.18.2-34 . I don't know why the vDSO was placed at different addresses than the compat address. It might not have to do with the address space randomization, but an independent feature. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c12 --- Comment #12 from K.R. Foley <kr@cybsft.com> 2007-11-20 09:48:14 MST --- I have some additional info that I have discovered lately. Upgrading to a newer 2.6.23ish kernel resolves this problem for MOST test cases that I have found. Also on 2.6.18 and 2.6.19 kernels this problem can be resolved for MOST test cases by doing "echo 0 > /proc/sys/vm/vdso_enabled". This problem also goes away on OpenSuse 10.3 for MOST test cases. There is one test case that continues to exhibit the problem in all of the above scenarios. I have a test program that is a cobbled-up OpenMotif program that does a pthread_exit. No matter what I try on OpenSuse, with the exception of disabling exceptions, this problem persists. I even tried using lesstif instead of openmotif, but no joy. The only thing I haven't tried is replacing glibc and I would rather not have to go that route if I can avoid it. It is worth noting that this problem doesn't seem to happen at all on Fedora 6 or 7. It is also worth noting that on Fedora they use lesstif instead of openmotif. If there is anything I can do to help debug this, feel free to ask. Upon request, I can post my openmotif test program (I did add it as an attachment on the glibc bugzilla bug #4123) if that will help. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641#c13 --- Comment #13 from K.R. Foley <kr@cybsft.com> 2007-11-20 09:59:23 MST --- Not sure why the bad URL got inserted in the post above. Here is the correct link since it seems to want one: http://sourceware.org/bugzilla/show_bug.cgi?id=4123 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=289641#c14 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jeffm@novell.com, ak@novell.com --- Comment #14 from Jeff Mahoney <jeffm@novell.com> 2008-01-04 12:47:17 MST --- Andi, will this patch fix this? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 User ak@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=289641#c15 Andi Kleen <ak@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jbeulich@novell.com --- Comment #15 from Andi Kleen <ak@novell.com> 2008-01-04 15:34:43 MST --- Likely. Best ask Jan since he wrote the original version of the patch. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=289641#c16 --- Comment #16 from Jan Beulich <jbeulich@novell.com> 2008-01-07 09:39:54 MST --- I don't think this large patch is really needed here: Simply replacing VDSO_COMPAT_BASE by VDSO_BASE in the definition of AT_SYSINFO_EHDR should get that entry in sync with AT_SYSINFO again (and since the unwind info in the vDSO is position independent, unwinding should work unless glibc has other requirements on addresses being relocated inside the vDSO, in which case the patch above is needed perhaps in addition to this change). Otoh there may have been a reason (which I don't know about) to keep AT_SYSINFO_EHDR at its old value. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 Greg Kroah-Hartman <gregkh@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel-maintainers@forge.provo.novell.com |jbeulich@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=289641#c20 Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |ahanssen@trolltech.com --- Comment #20 from Jan Beulich <jbeulich@novell.com> 2008-04-17 02:46:52 MST --- I committed a patch implementing the simpler approach (as outlined above). It'll need to be tested, though (I don't have a 32-bit 10.2 system at hand). Setting to needinfo for the originator to report after the next maintenance update. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 User dmueller@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=289641#c21 --- Comment #21 from Dirk Mueller <dmueller@novell.com> 2008-04-17 03:19:30 MST --- it doesn't seem reproducible on factory anymore, but would the patch apply there as well? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=289641#c22 --- Comment #22 from Jan Beulich <jbeulich@novell.com> 2008-04-17 03:22:06 MST --- If by factory you mean the HEAD kernel, then no, the problem has been fixed in a different way there. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=289641 User aj@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=289641#c23 Andreas Jaeger <aj@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|ahanssen@trolltech.com | Resolution| |FIXED --- Comment #23 from Andreas Jaeger <aj@novell.com> 2008-10-23 13:32:05 MDT --- according to comment #22 this is fixed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com