[opensuse-xorg] XOrg deadlock in a multi-threaded X app that works on FC/RHEL distributions
I am trying to get a library (written by someone else) to run on OpenSuse 12.1; I am not an X developer, but I inherited it and need things to run. The library is tested and runs on FC13, RHEL 5 and SL6 without problems. But it consistently creates deadlocks in X calls on OpenSuse 12.1 The sequence of events is 1. XInitThreads is called 2. The library creates a thread ("XEventThread") and starts executing it. The thread executes a simple loop that repeatedly calls XNextEvent and passes the returned events elsewhere. 3. XEventThread calls XNextEvent 4. XNextEvent blocks waiting for events; it does not return immediately because there are no windows yet 5. XEventThread releases control to the main app thread 6. the main app thread calls XLockScreen 7 the main app thread starts creating windows 8. The first call to X functions, e.g. XCreateSimpleWindow, hangs This is with XSyncrhonize enabled; if it is disabled, a few X calls will succeed, but eventually something will hang forever. On the OSes that work (FC13, ScientificLinux 6) the difference is point 8 - the calls to X never lock, everything proceeds happily. It works with a variety of window managers. Why is there a difference on the OpenSuse, and what can I do about it? I am happy to file a bug report with Novell, too, if this appears to be a bug. I originally asked a version of this question on the general X list (well, it started less clearly defined), and got a suggestion that this may be a thread safety fix missing from OpenSuse XLib. http://lists.freedesktop.org/archives/xorg/2012-June/054726.html I also created a cut-down version of the library (in the attached file) that displays the behaviour. On FC13 and RHEL it creates 3 windows with 3-second intervals between them, and then waits forever. It succeeds every time. On OpenSuse 12.1 it freezes as soon as the thread starts, usually on the first call to XCreateSimpleWindow during the creation of the first window. It is not affected by a window manager - fails for all of them. To compile gcc -O2 -ansi -g -DXGRAFIX -I. -DUNIX -D_cplusplus -D_XOPEN_SOURCE=500 -D_REENTRANT -c -o HGraf.o HGraf.c gcc HGraf.o -lpthread -lm -lX11 -L/usr/X11R6/lib mv a.out HGrafTest to run ./HGrafTest Is this indeed an OpenSuse bug, or am I missing something about the program? Thanks, Myrosia
Hallo, Myrosia, did you test it with OpenSuse only on one machine? I mean, perhaps the problem is not specific for OpenSuse but specific for a hardware or a video driver. And what happens if you use remote X, i.e. if you start the program on a Fedora (or other) machine but with the $DISPLAY set to the OpenSuse machine? Sorry that I cannot test it myself in the moment, Werner -- To unsubscribe, e-mail: opensuse-xorg+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-xorg+owner@opensuse.org
Hi Werner, sorry, should have said. I tested on two different machines: a recent 64-bit laptop and a 4-year-old 32-bit laptop, both DELL, running OpenSuse 12.1, and they exhibit the same deadlock. The hardware is different, but I think the video driver used is the same on both. I don't unfortunately have a different machine to test. I also tested on FC13 running in VirtualBox on one of those laptops, and there is no deadlock on that. I don't have an easy way to run on FC13 with display set to my OpenSuse machines, though I could try. What information would that add? Thanks, Myrosia On Sun, Jun 24, 2012 at 10:32 AM, Werner Scheinast <W.Scheinast@web.de> wrote:
Hallo, Myrosia,
did you test it with OpenSuse only on one machine? I mean, perhaps the problem is not specific for OpenSuse but specific for a hardware or a video driver. And what happens if you use remote X, i.e. if you start the program on a Fedora (or other) machine but with the $DISPLAY set to the OpenSuse machine?
Sorry that I cannot test it myself in the moment, Werner -- To unsubscribe, e-mail: opensuse-xorg+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-xorg+owner@opensuse.org
-- To unsubscribe, e-mail: opensuse-xorg+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-xorg+owner@opensuse.org
On Sun, Jun 24, 2012 at 6:41 PM, Myrosia Dzikovska <myrosia@gmail.com> wrote:
Hi Werner,
sorry, should have said. I tested on two different machines: a recent 64-bit laptop and a 4-year-old 32-bit laptop, both DELL, running OpenSuse 12.1, and they exhibit the same deadlock. The hardware is different, but I think the video driver used is the same on both. I don't unfortunately have a different machine to test.
I also tested on FC13 running in VirtualBox on one of those laptops, and there is no deadlock on that. I don't have an easy way to run on FC13 with display set to my OpenSuse machines, though I could try. What information would that add?
Thanks,
Myrosia
Ah, just remembered. While I have not tested this particular minimal example, I have run something using the library on which it is based on a laptop that is identical to one of mine, but using RHEL5. It had no problems. So this rules out the possibility that it is machine specific - got to be something about OpenSuse libraries. Myrosia
On Sun, Jun 24, 2012 at 10:32 AM, Werner Scheinast <W.Scheinast@web.de> wrote:
Hallo, Myrosia,
did you test it with OpenSuse only on one machine? I mean, perhaps the problem is not specific for OpenSuse but specific for a hardware or a video driver. And what happens if you use remote X, i.e. if you start the program on a Fedora (or other) machine but with the $DISPLAY set to the OpenSuse machine?
Sorry that I cannot test it myself in the moment, Werner -- To unsubscribe, e-mail: opensuse-xorg+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-xorg+owner@opensuse.org
-- To unsubscribe, e-mail: opensuse-xorg+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-xorg+owner@opensuse.org
I can confirm this gets stuck as well on my OpenSuSE 12.1 machine. The libX11 code contains following comment just above the pthread_cond_wait it is stuck on: /* If some thread is already waiting for events, * it will get the first one. That thread must * process that event before we can continue. */ /* FIXME: That event might be after this reply, * and might never even come--or there might be * multiple threads trying to get events. */ Fedora 13 uses older libX11 that doesn't contain this bug yet. In upstream, this was fixed by commit fd85aca7a616c595fc17b2520f84316a11e8906f. I can confirm your application doesn't get stuck on my machine with upstream Xorg build. Michal Srb -- To unsubscribe, e-mail: opensuse-xorg+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-xorg+owner@opensuse.org
On Mon, Jun 25, 2012 at 03:16:22PM +0200, Michal Srb wrote:
I can confirm this gets stuck as well on my OpenSuSE 12.1 machine.
The libX11 code contains following comment just above the pthread_cond_wait it is stuck on:
/* If some thread is already waiting for events, * it will get the first one. That thread must * process that event before we can continue. */ /* FIXME: That event might be after this reply, * and might never even come--or there might be * multiple threads trying to get events. */
Fedora 13 uses older libX11 that doesn't contain this bug yet.
In upstream, this was fixed by commit fd85aca7a616c595fc17b2520f84316a11e8906f. I can confirm your application doesn't get stuck on my machine with upstream Xorg build.
Thanks a lot, Michal! openSUSE 12.2 is going to ship with libX11 1.5.0, which already contains this fix. Stefan Public Key available ------------------------------------------------------ Stefan Dirsch (Res. & Dev.) SUSE LINUX Products GmbH Tel: 0911-740 53 0 Maxfeldstraße 5 FAX: 0911-740 53 479 D-90409 Nürnberg http://www.suse.de Germany -------------------------------------------------------------- SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -------------------------------------------------------------- -- To unsubscribe, e-mail: opensuse-xorg+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-xorg+owner@opensuse.org
OK, filed bug report 768775 https://bugzilla.novell.com/show_bug.cgi?id=768775 But Michal, I was unable to assign to you as you suggested. Whenever I enter your email (msrb@suse.com) into the assignee list, I get the message below: ---- Match Failed Bugzilla was unable to make any match at all for one or more of the names and/or email addresses you entered on the previous page. Please go back and try other names or email addresses. Assignee: msrb@suse.com did not match anything ------ I tried appending it to the list of addresses, this did not help, either. What am I doing wrong here? Thanks, Myrosia -- To unsubscribe, e-mail: opensuse-xorg+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-xorg+owner@opensuse.org
participants (4)
-
Michal Srb
-
Myrosia Dzikovska
-
Stefan Dirsch
-
Werner Scheinast