On 05/31/2012 12:50 PM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 12:28 -0400, Jerry Feldman wrote:
On 05/31/2012 11:41 AM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 11:17 -0400, Jerry Feldman wrote:
Basically, signal handlers are not threadsafe. I would probably guess that libz itself is not threadsafe. Additionally most of libc++ is also not threadsafe. Lots of discussions online about this. You just need to remember that threads run in the context of the user, and signal handlers run in the context of root. If you are sharing memory with memory used by the signal handler you are in very dangerous territory. I understand the signal context very well. I think we have it sorted ok. We do not share anything between the signal and the non-signal context. Of course we may have missed something...
libz is not thread safe? This is worrisome. I do indeed use libz in many threads. For example, I have a system with 25 or do transducers, each being serviced in a thread. One of the tasks each thread has is to save the collected data to a file via libz. I have not had any problems. Until the GPS one. If libz is truly not thread safe, I have a problem.
I see this: http://www.zlib.net/zlib_faq.html#faq21 . libz is thread safe, but perhaps the things it uses are not thread safe. In my case, I think it is libc (memory allocation and file I/O). I do not use libc++. libc is thread-safe for these things. So where would libz not ne thread-safe? It I use my own memory management routines. I do not do this.
So it does appear that libz is threadsafe. I have not worked on threads in a while. I always assume library functions are not threadsafe unless I learn otherwise. Could possibly be that the GPS code has a bug. At this point I am just guessing not knowing your app. One of the real beauties of Purify The GPS code in this case (lots disabled to get to the meat of the issue) is a read() and a gzwrite(). The debugger indicates that the few variables involved are as I would expect. I think the GPS code is an innocent victim. The killer is elsewhere.
http://www-01.ibm.com/software/awdtools/purify/unix/ is that it is able to find a lot of issues that other debugging options miss. The type of issues you are having are perfect for purify. While the product is pricery, you could use the trial version. A few years ago, when I was at Compaq, a guy at one of our meetings described a problem with his software. He had used other tools, but none found his problem. At my suggestion he tried Purify, it found the problem very quickly and his company decided to pay for a license. $7000 is rather pricey. I don't see much about how it works in a threaded application or one that receives signals. Interesting that it works without the source.
I guess I will try the trial version and see if that uncovers anything. Chances are it uncovers lots of things gcc lets pass. But in the end that is good. We have been working towards having the compiler in -pedantic mode so we can at least eliminate all those sorts of potential errors. Not quite there yet. It is an incremental thing. It is tricky in that lots of the code compiles on Linux (intel and ARM) and Windows and Microware OS-9.
At Compaq, I was on the team to implement it on Tru64 Unix.
Purify essentially, looks at every memory access at run time. I know
that it was a real bear to get it working with pthreads. Fortunately
Dave Butenhof was available to us. The main thing it does with memory is
know what state memory is in. So, if you have someone who is violating
memory constraints, it can detect that. Another issue is stack overflows
that may not be otherwise detected. It also traps every system call. It
goes well beyond the scope of compilers. There are some excellent open
source tools, such as valgrind. One of the issues I've had with threads
before is not threads itself, but with programmers forgetting to use
mutexes properly. Based on your background, I suspect that you have
covered all your bases. I've also seen a lot of issues where a bug shows
up in innocent code, but you have no clue where the actual offender is.
For instance, when malloc allocates memory from the OS, it gets a large
block (using sbrk(2) or mmap(2)), but the compiler really cannot detect
a violation. But, let's say some code steps on a pointer leaving it's
value incorrect, but that pointer is used somewhere else.
--
Jerry Feldman