Mailinglist Archive: opensuse-programming (16 mails)

< Previous Next >
Re: [opensuse-programming] threads and core files
On 05/31/2012 12:50 PM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 12:28 -0400, Jerry Feldman wrote:
On 05/31/2012 11:41 AM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 11:17 -0400, Jerry Feldman wrote:

Basically, signal handlers are not threadsafe. I would probably guess
that libz itself is not threadsafe. Additionally most of libc++ is also
not threadsafe. Lots of discussions online about this. You just need to
remember that threads run in the context of the user, and signal
handlers run in the context of root. If you are sharing memory with
memory used by the signal handler you are in very dangerous territory.
I understand the signal context very well. I think we have it sorted ok.
We do not share anything between the signal and the non-signal context.
Of course we may have missed something...

libz is not thread safe? This is worrisome. I do indeed use libz in many
threads. For example, I have a system with 25 or do transducers, each
being serviced in a thread. One of the tasks each thread has is to save
the collected data to a file via libz. I have not had any problems.
Until the GPS one. If libz is truly not thread safe, I have a problem.

I see this: http://www.zlib.net/zlib_faq.html#faq21 . libz is thread
safe, but perhaps the things it uses are not thread safe. In my case, I
think it is libc (memory allocation and file I/O). I do not use libc++.
libc is thread-safe for these things. So where would libz not ne
thread-safe? It I use my own memory management routines. I do not do
this.

So it does appear that libz is threadsafe. I have not worked on threads
in a while. I always assume library functions are not threadsafe unless
I learn otherwise. Could possibly be that the GPS code has a bug. At
this point I am just guessing not knowing your app. One of the real
beauties of Purify
The GPS code in this case (lots disabled to get to the meat of the
issue) is a read() and a gzwrite(). The debugger indicates that the few
variables involved are as I would expect. I think the GPS code is an
innocent victim. The killer is elsewhere.

<http://www-01.ibm.com/software/awdtools/purify/unix/> is that it is
able to find a lot of issues that other debugging options miss. The type
of issues you are having are perfect for purify. While the product is
pricery, you could use the trial version. A few years ago, when I was at
Compaq, a guy at one of our meetings described a problem with his
software. He had used other tools, but none found his problem. At my
suggestion he tried Purify, it found the problem very quickly and his
company decided to pay for a license.
$7000 is rather pricey. I don't see much about how it works in a
threaded application or one that receives signals. Interesting that it
works without the source.

I guess I will try the trial version and see if that uncovers anything.
Chances are it uncovers lots of things gcc lets pass. But in the end
that is good. We have been working towards having the compiler in
-pedantic mode so we can at least eliminate all those sorts of potential
errors. Not quite there yet. It is an incremental thing. It is tricky in
that lots of the code compiles on Linux (intel and ARM) and Windows and
Microware OS-9.


At Compaq, I was on the team to implement it on Tru64 Unix.
Purify essentially, looks at every memory access at run time. I know
that it was a real bear to get it working with pthreads. Fortunately
Dave Butenhof was available to us. The main thing it does with memory is
know what state memory is in. So, if you have someone who is violating
memory constraints, it can detect that. Another issue is stack overflows
that may not be otherwise detected. It also traps every system call. It
goes well beyond the scope of compilers. There are some excellent open
source tools, such as valgrind. One of the issues I've had with threads
before is not threads itself, but with programmers forgetting to use
mutexes properly. Based on your background, I suspect that you have
covered all your bases. I've also seen a lot of issues where a bug shows
up in innocent code, but you have no clue where the actual offender is.
For instance, when malloc allocates memory from the OS, it gets a large
block (using sbrk(2) or mmap(2)), but the compiler really cannot detect
a violation. But, let's say some code steps on a pointer leaving it's
value incorrect, but that pointer is used somewhere else.

--
Jerry Feldman <gaf@xxxxxxx>
Boston Linux and Unix
PGP key id:3BC1EB90
PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66 C0AF 7CEA 30FC 3BC1 EB90


< Previous Next >
List Navigation