[opensuse-programming] threads and core files
I have a threaded application that encounters a segmentation violation. I am fairly certain it is the initial thread that encounters the problem. But I just want to be sure of the following: 1. If a multi-threaded app encounters a seg violation, and a core dump is created, the core is of the thread that encountered the seg violation, and not of the main thread, right? 2. If a process starts a thread, and that thread exits, the process does not know about this until it tries to join the thread, right? So, if a thread has a seg violation and exits, the 'parent' thread will also not be made exit. It has to detect the thread is gone by it's own mechanisms or by trying to join the thread. I ask this because I want to be certain I am not misinterpreting which thread in my application is the one that really is getting the seg violation. This is on openSUSE 11.2 with kernel 2.6.31.14-51-desktop Yours sincerely, Roger Oberholtzer OPQ Systems / Ramböll RST Office: Int +46 10-615 60 20 Mobile: Int +46 70-815 1696 roger.oberholtzer@ramboll.se ________________________________________ Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden www.rambollrst.se -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
On 05/30/2012 09:39 AM, Roger Oberholtzer wrote:
I have a threaded application that encounters a segmentation violation. I am fairly certain it is the initial thread that encounters the problem. But I just want to be sure of the following:
1. If a multi-threaded app encounters a seg violation, and a core dump is created, the core is of the thread that encountered the seg violation, and not of the main thread, right?
It will be of the whole process, all threads. You can do for example thread apply all bt to get a backtrace of all threads in the process By default if you only run "bt" gdb will try to show you the backtrace of the thread that caused the segfault
2. If a process starts a thread, and that thread exits, the process does not know about this until it tries to join the thread, right? So, if a thread has a seg violation and exits, the 'parent' thread will also not be made exit. It has to detect the thread is gone by it's own mechanisms or by trying to join the thread. I ask this because I want to be certain I am not misinterpreting which thread in my application is the one that really is getting the seg violation.
If a thread segfaults, the whole process dies, threads and all. If you want threads to run independent of each other, you need to start them as processes, not threads Anders -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
On Wed, 2012-05-30 at 10:06 +0200, Anders Johansson wrote:
On 05/30/2012 09:39 AM, Roger Oberholtzer wrote:
I have a threaded application that encounters a segmentation violation. I am fairly certain it is the initial thread that encounters the problem. But I just want to be sure of the following:
1. If a multi-threaded app encounters a seg violation, and a core dump is created, the core is of the thread that encountered the seg violation, and not of the main thread, right?
It will be of the whole process, all threads. You can do for example
thread apply all bt
to get a backtrace of all threads in the process
Thanks for that. Very interesting.
By default if you only run "bt" gdb will try to show you the backtrace of the thread that caused the segfault
OK.
2. If a process starts a thread, and that thread exits, the process does not know about this until it tries to join the thread, right? So, if a thread has a seg violation and exits, the 'parent' thread will also not be made exit. It has to detect the thread is gone by it's own mechanisms or by trying to join the thread. I ask this because I want to be certain I am not misinterpreting which thread in my application is the one that really is getting the seg violation.
If a thread segfaults, the whole process dies, threads and all. If you want threads to run independent of each other, you need to start them as processes, not threads
OK. In my case, based on what bt lists, I think it is the initial process that is having the seg violation. Oddly, it is in libz. The debugger seems to indicate that the values passed are as I expect them to be. So I am guessing that the file descriptor contents have become corrupt. Not the pointer, as that is the one I expect. But something in what it points to. Maybe the debug for libz will shed some light. Yours sincerely, Roger Oberholtzer OPQ Systems / Ramböll RST Office: Int +46 10-615 60 20 Mobile: Int +46 70-815 1696 roger.oberholtzer@ramboll.se ________________________________________ Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden www.rambollrst.se -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
On 05/30/2012 03:39 AM, Roger Oberholtzer wrote:
I have a threaded application that encounters a segmentation violation. I am fairly certain it is the initial thread that encounters the problem. But I just want to be sure of the following:
1. If a multi-threaded app encounters a seg violation, and a core dump is created, the core is of the thread that encountered the seg violation, and not of the main thread, right?
2. If a process starts a thread, and that thread exits, the process does not know about this until it tries to join the thread, right? So, if a thread has a seg violation and exits, the 'parent' thread will also not be made exit. It has to detect the thread is gone by it's own mechanisms or by trying to join the thread. I ask this because I want to be certain I am not misinterpreting which thread in my application is the one that really is getting the seg violation.
This is on openSUSE 11.2 with kernel 2.6.31.14-51-desktop
" If a process starts a thread, and that thread exits, the process does not know about this until it tries to join the thread, right?" Not entirely true. There are a number of ways to allow a thread to exit avoiding the need to join. It has been a few years since I was working with pthreads, but you can set up a thread as detached. The main issue you need to understand is that unlike processes, threads run in the context of the thread creator. So, a segv in a thread will cause the entire process to fail. One of the things that really helps in thread programming is exception processing and try blocks. By wrapping sections of your code in try blocks you can avoid this nastiness. Additionally, if you have multiple child threads. Additionally I always recommend my former coworker, Dave Butenhof's books. -- Jerry Feldman <gaf@blu.org> Boston Linux and Unix PGP key id:3BC1EB90 PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66 C0AF 7CEA 30FC 3BC1 EB90
On Wed, 2012-05-30 at 09:43 -0400, Jerry Feldman wrote:
On 05/30/2012 03:39 AM, Roger Oberholtzer wrote:
" If a process starts a thread, and that thread exits, the process does not know about this until it tries to join the thread, right?"
Not entirely true. There are a number of ways to allow a thread to exit avoiding the need to join. It has been a few years since I was working with pthreads, but you can set up a thread as detached. The main issue you need to understand is that unlike processes, threads run in the context of the thread creator. So, a segv in a thread will cause the entire process to fail. One of the things that really helps in thread programming is exception processing and try blocks. By wrapping sections of your code in try blocks you can avoid this nastiness. Additionally, if you have multiple child threads. Additionally I always recommend my former coworker, Dave Butenhof's books.
The 'fun' I am having is that in the past few months, three equipment suppliers have provided a Linux interface to their hardware. Generally this should be considered a good thing. All are implemented by starting threads. For two of the suppliers (GigE Vision cameras) I do not have the source and so cannot determine if they are doing things in the safest fashion. I really like threads and see that our application benefits from them greatly. We have used them for a couple years, But when something goes wrong. and especially if it is in a black box bit of code, life gets tedious. Yours sincerely, Roger Oberholtzer OPQ Systems / Ramböll RST Office: Int +46 10-615 60 20 Mobile: Int +46 70-815 1696 roger.oberholtzer@ramboll.se ________________________________________ Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden www.rambollrst.se -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
On 05/30/2012 10:46 AM, Roger Oberholtzer wrote:
On 05/30/2012 03:39 AM, Roger Oberholtzer wrote: " If a process starts a thread, and that thread exits, the process does not know about this until it tries to join the thread, right?"
Not entirely true. There are a number of ways to allow a thread to exit avoiding the need to join. It has been a few years since I was working with pthreads, but you can set up a thread as detached. The main issue you need to understand is that unlike processes, threads run in the context of the thread creator. So, a segv in a thread will cause the entire process to fail. One of the things that really helps in thread programming is exception processing and try blocks. By wrapping sections of your code in try blocks you can avoid this nastiness. Additionally, if you have multiple child threads. Additionally I always recommend my former coworker, Dave Butenhof's books. The 'fun' I am having is that in the past few months, three equipment suppliers have provided a Linux interface to their hardware. Generally
On Wed, 2012-05-30 at 09:43 -0400, Jerry Feldman wrote: this should be considered a good thing. All are implemented by starting threads. For two of the suppliers (GigE Vision cameras) I do not have the source and so cannot determine if they are doing things in the safest fashion. I really like threads and see that our application benefits from them greatly. We have used them for a couple years, But when something goes wrong. and especially if it is in a black box bit of code, life gets tedious.
"life gets tedious" Naw, fun :-) Thread debugging can be challenging. You can use some tools, like gdb. I'm not sure, but IBM Rational's Purify was able to debug threads on some platforms. I've never used Purify on Linux, only on Digital/Compaq Tru64 Unix. However, you can still add try blocks to their code to help a bit. Thread debugging. though, can provide a lot of challenges. The first thing you need to know is if their code is thread-safe. -- Jerry Feldman <gaf@blu.org> Boston Linux and Unix PGP key id:3BC1EB90 PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66 C0AF 7CEA 30FC 3BC1 EB90
On Wed, 2012-05-30 at 11:55 -0400, Jerry Feldman wrote:
On 05/30/2012 10:46 AM, Roger Oberholtzer wrote:
On 05/30/2012 03:39 AM, Roger Oberholtzer wrote: " If a process starts a thread, and that thread exits, the process does not know about this until it tries to join the thread, right?"
Not entirely true. There are a number of ways to allow a thread to exit avoiding the need to join. It has been a few years since I was working with pthreads, but you can set up a thread as detached. The main issue you need to understand is that unlike processes, threads run in the context of the thread creator. So, a segv in a thread will cause the entire process to fail. One of the things that really helps in thread programming is exception processing and try blocks. By wrapping sections of your code in try blocks you can avoid this nastiness. Additionally, if you have multiple child threads. Additionally I always recommend my former coworker, Dave Butenhof's books. The 'fun' I am having is that in the past few months, three equipment suppliers have provided a Linux interface to their hardware. Generally
On Wed, 2012-05-30 at 09:43 -0400, Jerry Feldman wrote: this should be considered a good thing. All are implemented by starting threads. For two of the suppliers (GigE Vision cameras) I do not have the source and so cannot determine if they are doing things in the safest fashion. I really like threads and see that our application benefits from them greatly. We have used them for a couple years, But when something goes wrong. and especially if it is in a black box bit of code, life gets tedious.
"life gets tedious" Naw, fun :-)
Thread debugging can be challenging. You can use some tools, like gdb. I'm not sure, but IBM Rational's Purify was able to debug threads on some platforms. I've never used Purify on Linux, only on Digital/Compaq Tru64 Unix. However, you can still add try blocks to their code to help a bit. Thread debugging. though, can provide a lot of challenges. The first thing you need to know is if their code is thread-safe.
One would imagine threaded code to be thread-safe. At least in intention, it not in reality. For the suspect libraries, I do not have the source. The symptom I see is that when I use one of these libraries, I then get a segmentation violation in libz. And I cannot see any problem with my call to gzwrite(). I am guessing that some part of the zlib data structure for the specific file has been corrupted by someone. The failure is always at the same point: a SIGPOLL signal results in a signal handler reading from a GPS, and the read data is sent to gzwrite(). The reads/writes are properly serialized. The signals do not interrupt themselves. When I add the thread library to the application so that there are threads reading from GigE Vision cameras, this problem can occur. If I don't enable the GPS reader, I do not get a failure. I suspect a memory corruption still occurs, but at some place that is 'innocent'. Yours sincerely, Roger Oberholtzer OPQ Systems / Ramböll RST Office: Int +46 10-615 60 20 Mobile: Int +46 70-815 1696 roger.oberholtzer@ramboll.se ________________________________________ Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden www.rambollrst.se -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
On Thu, 2012-05-31 at 08:50 +0200, Roger Oberholtzer wrote:
The symptom I see is that when I use one of these libraries, I then get a segmentation violation in libz. And I cannot see any problem with my call to gzwrite(). I am guessing that some part of the zlib data structure for the specific file has been corrupted by someone. The failure is always at the same point: a SIGPOLL signal results in a signal handler reading from a GPS, and the read data is sent to gzwrite(). The reads/writes are properly serialized. The signals do not interrupt themselves. When I add the thread library to the application so that there are threads reading from GigE Vision cameras, this problem can occur.
I should add this to my original question in this thread: If a process has signal handlers, how does this effect the core file? Meaning that if a seg violation occurs, and then a signal arrives, could the signal handler somehow confuse what I see in gdb? In my case, it looks like this: #0 0xb5df0c23 in ?? () from /lib/libz.so.1 #1 0xb5df0ff8 in deflate () from /lib/libz.so.1 #2 0xb5dee674 in gzwrite () from /lib/libz.so.1 #3 0x0808723d in GpsDataHandler (info=0x8329fd8) at ../gps.c:2430 #4 0xb5bbb60d in SIGPOLLhandler (sig=29) at ../aim.c:129 #5 <signal handler called> #6 0xffffe424 in __kernel_vsyscall () #7 0xb5d1d6a1 in select () from /lib/libc.so.6 #8 0xb71fae66 in Tcl_WaitForEvent () from /usr/lib/libtcl8.5.so #9 0xb71c2cdb in Tcl_DoOneEvent () from /usr/lib/libtcl8.5.so #10 0x08065687 in TheRealMeasurementLoop () at ../measure.c:223 #11 DoMeasurement () at ../measure.c:310 #12 0x08062aea in DoHiway (interp=0x81879c8) at ../hiway.c:684 #13 0xb7025687 in Tk_MainEx () from /usr/lib/libtk8.5.so #14 0x080609fe in main (argc=0, argv=0xbfa2af14, envp=0x817c160) at ../hiway.c:164 (I have newer traces that tell the line in libz that fails.) In fact, the SIGPOLL is happening for other sources as well. For example, a UDP port. And that on is happening much more often than the GPS one. So if it was a code artifact that I saw the SIGPOLL, I would have expected it to be the one that happens much more often. As that is never the case, I have come to the conclusion that the seg violation is indeed in the signal handler. Could my assumption be wrong? Yours sincerely, Roger Oberholtzer OPQ Systems / Ramböll RST Office: Int +46 10-615 60 20 Mobile: Int +46 70-815 1696 roger.oberholtzer@ramboll.se ________________________________________ Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden www.rambollrst.se -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
On 05/31/2012 02:50 AM, Roger Oberholtzer wrote:
On 05/30/2012 10:46 AM, Roger Oberholtzer wrote:
On 05/30/2012 03:39 AM, Roger Oberholtzer wrote: " If a process starts a thread, and that thread exits, the process does not know about this until it tries to join the thread, right?"
Not entirely true. There are a number of ways to allow a thread to exit avoiding the need to join. It has been a few years since I was working with pthreads, but you can set up a thread as detached. The main issue you need to understand is that unlike processes, threads run in the context of the thread creator. So, a segv in a thread will cause the entire process to fail. One of the things that really helps in thread programming is exception processing and try blocks. By wrapping sections of your code in try blocks you can avoid this nastiness. Additionally, if you have multiple child threads. Additionally I always recommend my former coworker, Dave Butenhof's books. The 'fun' I am having is that in the past few months, three equipment suppliers have provided a Linux interface to their hardware. Generally
On Wed, 2012-05-30 at 09:43 -0400, Jerry Feldman wrote: this should be considered a good thing. All are implemented by starting threads. For two of the suppliers (GigE Vision cameras) I do not have the source and so cannot determine if they are doing things in the safest fashion. I really like threads and see that our application benefits from them greatly. We have used them for a couple years, But when something goes wrong. and especially if it is in a black box bit of code, life gets tedious.
"life gets tedious" Naw, fun :-)
Thread debugging can be challenging. You can use some tools, like gdb. I'm not sure, but IBM Rational's Purify was able to debug threads on some platforms. I've never used Purify on Linux, only on Digital/Compaq Tru64 Unix. However, you can still add try blocks to their code to help a bit. Thread debugging. though, can provide a lot of challenges. The first thing you need to know is if their code is thread-safe. One would imagine threaded code to be thread-safe. At least in intention, it not in reality. For the suspect libraries, I do not have
On Wed, 2012-05-30 at 11:55 -0400, Jerry Feldman wrote: the source.
The symptom I see is that when I use one of these libraries, I then get a segmentation violation in libz. And I cannot see any problem with my call to gzwrite(). I am guessing that some part of the zlib data structure for the specific file has been corrupted by someone. The failure is always at the same point: a SIGPOLL signal results in a signal handler reading from a GPS, and the read data is sent to gzwrite(). The reads/writes are properly serialized. The signals do not interrupt themselves. When I add the thread library to the application so that there are threads reading from GigE Vision cameras, this problem can occur.
If I don't enable the GPS reader, I do not get a failure. I suspect a memory corruption still occurs, but at some place that is 'innocent'.
Basically, signal handlers are not threadsafe. I would probably guess that libz itself is not threadsafe. Additionally most of libc++ is also not threadsafe. Lots of discussions online about this. You just need to remember that threads run in the context of the user, and signal handlers run in the context of root. If you are sharing memory with memory used by the signal handler you are in very dangerous territory. -- Jerry Feldman <gaf@blu.org> Boston Linux and Unix PGP key id:3BC1EB90 PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66 C0AF 7CEA 30FC 3BC1 EB90
On Thu, 2012-05-31 at 11:17 -0400, Jerry Feldman wrote:
Basically, signal handlers are not threadsafe. I would probably guess that libz itself is not threadsafe. Additionally most of libc++ is also not threadsafe. Lots of discussions online about this. You just need to remember that threads run in the context of the user, and signal handlers run in the context of root. If you are sharing memory with memory used by the signal handler you are in very dangerous territory.
I understand the signal context very well. I think we have it sorted ok. We do not share anything between the signal and the non-signal context. Of course we may have missed something... libz is not thread safe? This is worrisome. I do indeed use libz in many threads. For example, I have a system with 25 or do transducers, each being serviced in a thread. One of the tasks each thread has is to save the collected data to a file via libz. I have not had any problems. Until the GPS one. If libz is truly not thread safe, I have a problem. I see this: http://www.zlib.net/zlib_faq.html#faq21 . libz is thread safe, but perhaps the things it uses are not thread safe. In my case, I think it is libc (memory allocation and file I/O). I do not use libc++. libc is thread-safe for these things. So where would libz not ne thread-safe? It I use my own memory management routines. I do not do this. Yours sincerely, Roger Oberholtzer OPQ Systems / Ramböll RST Office: Int +46 10-615 60 20 Mobile: Int +46 70-815 1696 roger.oberholtzer@ramboll.se ________________________________________ Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden www.rambollrst.se -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
On 05/31/2012 11:41 AM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 11:17 -0400, Jerry Feldman wrote:
Basically, signal handlers are not threadsafe. I would probably guess that libz itself is not threadsafe. Additionally most of libc++ is also not threadsafe. Lots of discussions online about this. You just need to remember that threads run in the context of the user, and signal handlers run in the context of root. If you are sharing memory with memory used by the signal handler you are in very dangerous territory. I understand the signal context very well. I think we have it sorted ok. We do not share anything between the signal and the non-signal context. Of course we may have missed something...
libz is not thread safe? This is worrisome. I do indeed use libz in many threads. For example, I have a system with 25 or do transducers, each being serviced in a thread. One of the tasks each thread has is to save the collected data to a file via libz. I have not had any problems. Until the GPS one. If libz is truly not thread safe, I have a problem.
I see this: http://www.zlib.net/zlib_faq.html#faq21 . libz is thread safe, but perhaps the things it uses are not thread safe. In my case, I think it is libc (memory allocation and file I/O). I do not use libc++. libc is thread-safe for these things. So where would libz not ne thread-safe? It I use my own memory management routines. I do not do this.
So it does appear that libz is threadsafe. I have not worked on threads in a while. I always assume library functions are not threadsafe unless I learn otherwise. Could possibly be that the GPS code has a bug. At this point I am just guessing not knowing your app. One of the real beauties of Purify <http://www-01.ibm.com/software/awdtools/purify/unix/> is that it is able to find a lot of issues that other debugging options miss. The type of issues you are having are perfect for purify. While the product is pricery, you could use the trial version. A few years ago, when I was at Compaq, a guy at one of our meetings described a problem with his software. He had used other tools, but none found his problem. At my suggestion he tried Purify, it found the problem very quickly and his company decided to pay for a license. -- Jerry Feldman <gaf@blu.org> Boston Linux and Unix PGP key id:3BC1EB90 PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66 C0AF 7CEA 30FC 3BC1 EB90
On Thu, 2012-05-31 at 12:28 -0400, Jerry Feldman wrote:
On 05/31/2012 11:41 AM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 11:17 -0400, Jerry Feldman wrote:
Basically, signal handlers are not threadsafe. I would probably guess that libz itself is not threadsafe. Additionally most of libc++ is also not threadsafe. Lots of discussions online about this. You just need to remember that threads run in the context of the user, and signal handlers run in the context of root. If you are sharing memory with memory used by the signal handler you are in very dangerous territory. I understand the signal context very well. I think we have it sorted ok. We do not share anything between the signal and the non-signal context. Of course we may have missed something...
libz is not thread safe? This is worrisome. I do indeed use libz in many threads. For example, I have a system with 25 or do transducers, each being serviced in a thread. One of the tasks each thread has is to save the collected data to a file via libz. I have not had any problems. Until the GPS one. If libz is truly not thread safe, I have a problem.
I see this: http://www.zlib.net/zlib_faq.html#faq21 . libz is thread safe, but perhaps the things it uses are not thread safe. In my case, I think it is libc (memory allocation and file I/O). I do not use libc++. libc is thread-safe for these things. So where would libz not ne thread-safe? It I use my own memory management routines. I do not do this.
So it does appear that libz is threadsafe. I have not worked on threads in a while. I always assume library functions are not threadsafe unless I learn otherwise. Could possibly be that the GPS code has a bug. At this point I am just guessing not knowing your app. One of the real beauties of Purify
The GPS code in this case (lots disabled to get to the meat of the issue) is a read() and a gzwrite(). The debugger indicates that the few variables involved are as I would expect. I think the GPS code is an innocent victim. The killer is elsewhere.
<http://www-01.ibm.com/software/awdtools/purify/unix/> is that it is able to find a lot of issues that other debugging options miss. The type of issues you are having are perfect for purify. While the product is pricery, you could use the trial version. A few years ago, when I was at Compaq, a guy at one of our meetings described a problem with his software. He had used other tools, but none found his problem. At my suggestion he tried Purify, it found the problem very quickly and his company decided to pay for a license.
$7000 is rather pricey. I don't see much about how it works in a threaded application or one that receives signals. Interesting that it works without the source. I guess I will try the trial version and see if that uncovers anything. Chances are it uncovers lots of things gcc lets pass. But in the end that is good. We have been working towards having the compiler in -pedantic mode so we can at least eliminate all those sorts of potential errors. Not quite there yet. It is an incremental thing. It is tricky in that lots of the code compiles on Linux (intel and ARM) and Windows and Microware OS-9. Thanks for the help. Yours sincerely, Roger Oberholtzer OPQ Systems / Ramböll RST Office: Int +46 10-615 60 20 Mobile: Int +46 70-815 1696 roger.oberholtzer@ramboll.se ________________________________________ Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden www.rambollrst.se -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
On 05/31/2012 12:50 PM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 12:28 -0400, Jerry Feldman wrote:
On 05/31/2012 11:41 AM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 11:17 -0400, Jerry Feldman wrote:
Basically, signal handlers are not threadsafe. I would probably guess that libz itself is not threadsafe. Additionally most of libc++ is also not threadsafe. Lots of discussions online about this. You just need to remember that threads run in the context of the user, and signal handlers run in the context of root. If you are sharing memory with memory used by the signal handler you are in very dangerous territory. I understand the signal context very well. I think we have it sorted ok. We do not share anything between the signal and the non-signal context. Of course we may have missed something...
libz is not thread safe? This is worrisome. I do indeed use libz in many threads. For example, I have a system with 25 or do transducers, each being serviced in a thread. One of the tasks each thread has is to save the collected data to a file via libz. I have not had any problems. Until the GPS one. If libz is truly not thread safe, I have a problem.
I see this: http://www.zlib.net/zlib_faq.html#faq21 . libz is thread safe, but perhaps the things it uses are not thread safe. In my case, I think it is libc (memory allocation and file I/O). I do not use libc++. libc is thread-safe for these things. So where would libz not ne thread-safe? It I use my own memory management routines. I do not do this.
So it does appear that libz is threadsafe. I have not worked on threads in a while. I always assume library functions are not threadsafe unless I learn otherwise. Could possibly be that the GPS code has a bug. At this point I am just guessing not knowing your app. One of the real beauties of Purify The GPS code in this case (lots disabled to get to the meat of the issue) is a read() and a gzwrite(). The debugger indicates that the few variables involved are as I would expect. I think the GPS code is an innocent victim. The killer is elsewhere.
<http://www-01.ibm.com/software/awdtools/purify/unix/> is that it is able to find a lot of issues that other debugging options miss. The type of issues you are having are perfect for purify. While the product is pricery, you could use the trial version. A few years ago, when I was at Compaq, a guy at one of our meetings described a problem with his software. He had used other tools, but none found his problem. At my suggestion he tried Purify, it found the problem very quickly and his company decided to pay for a license. $7000 is rather pricey. I don't see much about how it works in a threaded application or one that receives signals. Interesting that it works without the source.
I guess I will try the trial version and see if that uncovers anything. Chances are it uncovers lots of things gcc lets pass. But in the end that is good. We have been working towards having the compiler in -pedantic mode so we can at least eliminate all those sorts of potential errors. Not quite there yet. It is an incremental thing. It is tricky in that lots of the code compiles on Linux (intel and ARM) and Windows and Microware OS-9.
At Compaq, I was on the team to implement it on Tru64 Unix. Purify essentially, looks at every memory access at run time. I know that it was a real bear to get it working with pthreads. Fortunately Dave Butenhof was available to us. The main thing it does with memory is know what state memory is in. So, if you have someone who is violating memory constraints, it can detect that. Another issue is stack overflows that may not be otherwise detected. It also traps every system call. It goes well beyond the scope of compilers. There are some excellent open source tools, such as valgrind. One of the issues I've had with threads before is not threads itself, but with programmers forgetting to use mutexes properly. Based on your background, I suspect that you have covered all your bases. I've also seen a lot of issues where a bug shows up in innocent code, but you have no clue where the actual offender is. For instance, when malloc allocates memory from the OS, it gets a large block (using sbrk(2) or mmap(2)), but the compiler really cannot detect a violation. But, let's say some code steps on a pointer leaving it's value incorrect, but that pointer is used somewhere else. -- Jerry Feldman <gaf@blu.org> Boston Linux and Unix PGP key id:3BC1EB90 PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66 C0AF 7CEA 30FC 3BC1 EB90
On 05/31/2012 12:50 PM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 12:28 -0400, Jerry Feldman wrote:
On 05/31/2012 11:41 AM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 11:17 -0400, Jerry Feldman wrote:
Basically, signal handlers are not threadsafe. I would probably guess that libz itself is not threadsafe. Additionally most of libc++ is also not threadsafe. Lots of discussions online about this. You just need to remember that threads run in the context of the user, and signal handlers run in the context of root. If you are sharing memory with memory used by the signal handler you are in very dangerous territory. I understand the signal context very well. I think we have it sorted ok. We do not share anything between the signal and the non-signal context. Of course we may have missed something...
libz is not thread safe? This is worrisome. I do indeed use libz in many threads. For example, I have a system with 25 or do transducers, each being serviced in a thread. One of the tasks each thread has is to save the collected data to a file via libz. I have not had any problems. Until the GPS one. If libz is truly not thread safe, I have a problem.
I see this: http://www.zlib.net/zlib_faq.html#faq21 . libz is thread safe, but perhaps the things it uses are not thread safe. In my case, I think it is libc (memory allocation and file I/O). I do not use libc++. libc is thread-safe for these things. So where would libz not ne thread-safe? It I use my own memory management routines. I do not do this.
So it does appear that libz is threadsafe. I have not worked on threads in a while. I always assume library functions are not threadsafe unless I learn otherwise. Could possibly be that the GPS code has a bug. At this point I am just guessing not knowing your app. One of the real beauties of Purify The GPS code in this case (lots disabled to get to the meat of the issue) is a read() and a gzwrite(). The debugger indicates that the few variables involved are as I would expect. I think the GPS code is an innocent victim. The killer is elsewhere.
<http://www-01.ibm.com/software/awdtools/purify/unix/> is that it is able to find a lot of issues that other debugging options miss. The type of issues you are having are perfect for purify. While the product is pricery, you could use the trial version. A few years ago, when I was at Compaq, a guy at one of our meetings described a problem with his software. He had used other tools, but none found his problem. At my suggestion he tried Purify, it found the problem very quickly and his company decided to pay for a license. $7000 is rather pricey. I don't see much about how it works in a threaded application or one that receives signals. Interesting that it works without the source.
I guess I will try the trial version and see if that uncovers anything. Chances are it uncovers lots of things gcc lets pass. But in the end that is good. We have been working towards having the compiler in -pedantic mode so we can at least eliminate all those sorts of potential errors. Not quite there yet. It is an incremental thing. It is tricky in that lots of the code compiles on Linux (intel and ARM) and Windows and Microware OS-9.
In the example I gave about the guy who tried Purify, he found the culprit right away and it was not his code. In short (it was in a telephone switch) it was a case of not hanging up the phone. -- Jerry Feldman <gaf@blu.org> Boston Linux and Unix PGP key id:3BC1EB90 PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66 C0AF 7CEA 30FC 3BC1 EB90
On Thu, 2012-05-31 at 12:28 -0400, Jerry Feldman wrote:
So it does appear that libz is threadsafe. I have not worked on threads in a while. I always assume library functions are not threadsafe unless I learn otherwise. Could possibly be that the GPS code has a bug. At this point I am just guessing not knowing your app. One of the real beauties of Purify <http://www-01.ibm.com/software/awdtools/purify/unix/> is that it is able to find a lot of issues that other debugging options miss. The type of issues you are having are perfect for purify. While the product is pricery, you could use the trial version. A few years ago, when I was at Compaq, a guy at one of our meetings described a problem with his software. He had used other tools, but none found his problem. At my suggestion he tried Purify, it found the problem very quickly and his company decided to pay for a license.
Very strange. In purify, my application stops with a segmentation violation before it starts. That is, the program loaed seems to be loading a library (libkakadu which is a very fast JPEG2000 decoder). It is still in the startup code. The entry point in the library is _init (if I understand what purify is telling). My main() has not been called. Hard to debug a program when you never get a chance to start... Maybe it is because I am not running a supported kernel. No openSUSE kernels are supported by purify. I think I will see what ElectricFence tells. Yours sincerely, Roger Oberholtzer OPQ Systems / Ramböll RST Office: Int +46 10-615 60 20 Mobile: Int +46 70-815 1696 roger.oberholtzer@ramboll.se ________________________________________ Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden www.rambollrst.se -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
On 06/01/2012 08:29 AM, Roger Oberholtzer wrote:
On Thu, 2012-05-31 at 12:28 -0400, Jerry Feldman wrote:
So it does appear that libz is threadsafe. I have not worked on threads in a while. I always assume library functions are not threadsafe unless I learn otherwise. Could possibly be that the GPS code has a bug. At this point I am just guessing not knowing your app. One of the real beauties of Purify <http://www-01.ibm.com/software/awdtools/purify/unix/> is that it is able to find a lot of issues that other debugging options miss. The type of issues you are having are perfect for purify. While the product is pricery, you could use the trial version. A few years ago, when I was at Compaq, a guy at one of our meetings described a problem with his software. He had used other tools, but none found his problem. At my suggestion he tried Purify, it found the problem very quickly and his company decided to pay for a license. Very strange. In purify, my application stops with a segmentation violation before it starts. That is, the program loaed seems to be loading a library (libkakadu which is a very fast JPEG2000 decoder). It is still in the startup code. The entry point in the library is _init (if I understand what purify is telling). My main() has not been called.
Hard to debug a program when you never get a chance to start...
Maybe it is because I am not running a supported kernel. No openSUSE kernels are supported by purify.
I think I will see what ElectricFence tells.
I've seen this before. There is a lot of code that gets executed before the actual main() function. I don't think this has to do with the kernel, but possibly the loader. ElectricFence is also a good tool. One of the things that is normally done before main is memory allocation. Some times, libraries need to allocate some memory. Also, remember that Purify likes to tram mmap(2). -- Jerry Feldman <gaf@blu.org> Boston Linux and Unix PGP key id:3BC1EB90 PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66 C0AF 7CEA 30FC 3BC1 EB90
On Fri, 2012-06-01 at 16:07 -0400, Jerry Feldman wrote:
I've seen this before. There is a lot of code that gets executed before the actual main() function. I don't think this has to do with the kernel, but possibly the loader. ElectricFence is also a good tool. One of the things that is normally done before main is memory allocation. Some times, libraries need to allocate some memory. Also, remember that Purify likes to tram mmap(2).
But is any code in a library actually called when it is loaded? I know on Windows there is code called then a DLL is loaded, reloaded or unloaded, and that the DLL maker has access to these hooks. Is there such code in a Linux DSO? I thought that all that happened was that symbols were resolved. Any memory allocated must be something determined by the linker when the library was made (library static data storage). Surely that allocation code is not failing Purify... My problem with ElectricFence is that I never have enough memory to let it run. And, it seems that it does not use swap. I added 8 GB of additional swap and set all my limits to unlimited. But the swap is never used when EF exits complaining it cannot get enough memory. My application has many small (and big) memory allocations. And the way EF works, that translates to lots of memory needed... Yours sincerely, Roger Oberholtzer OPQ Systems / Ramböll RST Office: Int +46 10-615 60 20 Mobile: Int +46 70-815 1696 roger.oberholtzer@ramboll.se ________________________________________ Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden www.rambollrst.se -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
Roger Oberholtzer wrote:
But is any code in a library actually called when it is loaded?
Yes, you can have library functions called as initialization when the library is loaded/opened. -- Per Jessen, Zürich (16.0°C) -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
Hello, On Mon, 04 Jun 2012, Roger Oberholtzer wrote:
But is any code in a library actually called when it is loaded?
Yes. From [1], page 12: ==== Once the relocations are performed the DSOs and the application code can actually be used. But there is one more thing to do: optionally the DSOs and the application must be initialized. The author of the code can de#ne for each object a number of initialization functions which are run before the DSO is used by other code. To perform the initialization the functions can use code from the own object and all the dependencies. To make this work the dynamic linker must make sure the objects are initialized in the correct order, i.e., the dependencies of an object must be initialized before the object. [..] At this point it is useful to look at the way to correctly write constructors and destructors for DSOs. Some systems had the convention that exported functions named _init and _fini are automatically picked as constructor and destructor respectively. This convention is still followed by GNU ld and using functions with these names on a Linux system will indeed cause the functions used in these capacities. But this is totally, 100% wrong! By using these functions the programmer overwrites whatever initialization and destruction functionality the system itself is using. The result is a DSO which is not fully initialized and this sooner or later leads to a catastrophy. The correct way of adding constructors and destructors is by marking functions with the constructor and destructor function attribute respectively. void __attribute__ ((constructor)) init_function (void) { ... } void __attribute__ ((destructor)) fini_function (void) { ... } These functions should not be exported either (see sections 2.2.2 and 2.2.3) but this is just an optimization. With the functions defined like this the runtime will arrange that they are called at the right time, after performing whatever initialization is necessary before. ==== [1] http://www.akkadia.org/drepper/dsohowto.pdf HTH, -dnh -- "Grove giveth and Gates taketh away." - Bob Metcalfe (inventor of Ethernet) on the trend of hardware speedups not being able to keep up with software demands -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
Wow. I wish I had known this before. I recall wishing for something like this a while back. I guess we solved it some other way. But with the good comes the bad, I guess. One more thing to worry about. Luckily I have the source for one of the offending libraries. I will have to see if there is some such routine and oddness around it. On Jun 4, 2012, at 6:29 PM, David Haller wrote:
Hello,
On Mon, 04 Jun 2012, Roger Oberholtzer wrote:
But is any code in a library actually called when it is loaded?
Yes. From [1], page 12: ==== Once the relocations are performed the DSOs and the application code can actually be used. But there is one more thing to do: optionally the DSOs and the application must be initialized. The author of the code can de#ne for each object a number of initialization functions which are run before the DSO is used by other code. To perform the initialization the functions can use code from the own object and all the dependencies. To make this work the dynamic linker must make sure the objects are initialized in the correct order, i.e., the dependencies of an object must be initialized before the object. [..] At this point it is useful to look at the way to correctly write constructors and destructors for DSOs. Some systems had the convention that exported functions named _init and _fini are automatically picked as constructor and destructor respectively. This convention is still followed by GNU ld and using functions with these names on a Linux system will indeed cause the functions used in these capacities. But this is totally, 100% wrong!
By using these functions the programmer overwrites whatever initialization and destruction functionality the system itself is using. The result is a DSO which is not fully initialized and this sooner or later leads to a catastrophy. The correct way of adding constructors and destructors is by marking functions with the constructor and destructor function attribute respectively.
void __attribute__ ((constructor)) init_function (void) { ... } void __attribute__ ((destructor)) fini_function (void) { ... }
These functions should not be exported either (see sections 2.2.2 and 2.2.3) but this is just an optimization. With the functions defined like this the runtime will arrange that they are called at the right time, after performing whatever initialization is necessary before. ====
Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Kapellgränd 7 P.O. Box 4205 SE-102 65 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 And remember: It is RSofT and there is always something under construction. It is like talking about large city with all constructions finished. Not impossible, but very unlikely. -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
participants (5)
-
Anders Johansson
-
David Haller
-
Jerry Feldman
-
Per Jessen
-
Roger Oberholtzer