[Bug 497624] New: thread waiting for semaphore is not woken
From the output you can see that the script is waiting for a semaphore. There is a line in the output containing "waiting", where the semaphore address is
http://bugzilla.novell.com/show_bug.cgi?id=497624 Summary: thread waiting for semaphore is not woken Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: i686 OS/Version: openSUSE 11.1 Status: NEW Severity: Critical Priority: P5 - None Component: Development AssignedTo: pth@novell.com ReportedBy: keesjan@cas.et.tudelft.nl QAContact: qa@suse.de Found By: --- Created an attachment (id=287759) --> (http://bugzilla.novell.com/attachment.cgi?id=287759) Archive containing source files and build-script (see "sema/doit") User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko/2009032600 SUSE/3.0.8-1.1.1 Firefox/3.0.8 A (relatively) simple program with 2 threads (a main thread and a seperate thread) causes an "artificial" deadlock situation on 32bit openSUSE 11.1. I have tried this also on 32bit openSUSE 10 and there this problem did not occur. Let me describe the problem in more detail. Basically, in my program (see attachment and details below) there are two semaphores. The main thread does a "post" on semaphore "sema1" and then a "wait" on sema2, and the child thread does a "wait" on sema1 and then a "post" on sema2. As I understand, this should pose no possibility for deadlock, yet it occurs in my test-program, almost every time I run it (but not always!) Curiously, when the program is hanging, I can hit Ctrl-Z to stop it, then type "fg" to continue it, and it will in fact continue running. Thus it can overcome the "artificial" deadlock situation by sending it some signals (although the program has no signal handler installed, or anything related). In the attachment is an archive which can be used to reproduce this problem. Unfortunately, I could not put all the code into one file, as doing this breaks the reproducibility (!) It appears that the dynamic linker is also involved in the problem, but I am not certain. However, I do not believe the build-tools are to blame (see explanation below). I would like to note that I find this problem quite serious, since it took me almost a day to dumb down my code (about 30.000 lines) to this little example program (165 lines) which reproduces the problem. I would certainly like to know the exact cause of this, so that we can continue developing on openSUSE with the faith it deserves :-) Reproducible: Sometimes Steps to Reproduce: 0. make sure you have openSUSE 11.1, and a 32bit machine. 1. extract the archive in the attachment 2. cd to the directory called "sema" (which exists after extraction) 3. run the script "./doit", possibly several times if the problem does not occur. 4. if the script hangs, bingo :-/ 5. optionally type "Ctrl-Z" and then "fg" to see that the program continues. This demonstrates that something is definitely not ok. Actual Results: The "doit" script hangs, which means that the executable ".out/main" hangs. posted to later (also shown in the output). However, the waiting thread does not continue. Expected Results: The script should have finished normally, without hanging. I have also tested this on a 64bit machine with openSUSE 11.1. There, the problem cannot be reproduced. Also, I have tested this on a clean 32bit VirtualBox installation of openSUSE 11.1, where the problem *CAN* be reproduced easily. I can provide the corresponding virtual disk (.vdi) upon request. To see whether this is a problem with the build tools, I did the following test: I compiled the program on openSUSE 10 (where the problem does not occur), and transferred the binaries (libraries and executable) to the openSUSE 11.1 machine. The result: the problem again occurred! Therefore, I believe the build-tools are at least sane (but this is not 100% sure of course). I suspect (but nothing more) that the problem is either in the libraries (libpthread, glibc), or in the kernel of openSUSE 11.1. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=497624 Philipp Thomas <pth@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pth@novell.com AssignedTo|pth@novell.com |rguenther@novell.com -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=497624 User keesjan@cas.et.tudelft.nl added comment http://bugzilla.novell.com/show_bug.cgi?id=497624#c1 --- Comment #1 from Kees-Jan van der Kolk <keesjan@cas.et.tudelft.nl> 2009-04-23 09:45:54 MDT --- In MySemaphore.h, there are some loops around the sem_wait and sem_post calls. These are there to handle the case where a signal interrupts these calls (an error with errno=EINTR is generated then). When such a situation occurs, the calls are simply retried. Some comments in MySemaphore.h are probably not too precise about this :-) So hopefully this clears it up a little, if there is any confusion. Also, I noted that removing the last "semaphore.post()" call from the main() function makes the problem disappear, which is strange because this post() happens on a separate semaphore which does not have any waiting threads. Anyway, this is a weird problem... -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=497624 User rguenther@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=497624#c2 Richard Guenther <rguenther@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID Severity|Critical |Normal --- Comment #2 from Richard Guenther <rguenther@novell.com> 2009-05-15 07:45:23 MDT --- I can reproduce this on x86_64 by building with -m32. The issue is that you need to link the shared libraries with -lpthread as well to get pthreads initialized before the global constructors for the semaphores run. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com