[Bug 1191345] New: opendkim segv in libunbound8
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 Bug ID: 1191345 Summary: opendkim segv in libunbound8 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: screening-team-bugs@suse.de Reporter: patrick.schaaf@yalwa.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- I run two loadbalanced postfix MX servers, both kept kind of up-to-date with tumbleweed. Part of that install, on both of them, is opendkim for mail signing. Setup basically has been running flawlessly for a few years. One system was updated about 40 days ago, to tumbleweed 20210810, which has libunbound8-1.13.1-2.1.x86_64 Today I updated the other one to even newer tumbleweed, 20210929, which brought newerlibunbound8-1.13.2-1.2.x86_64 package. And there, I experienced opendkim hitting a segv, after ~1500 seconds of uptime, restarted it manually, and it ran into a second one at ~3600 seconds of system uptime. The dmesg output, see below, pointed at libunbound8. So I manually force-downgraded just that package to the version that has been running on the partner machine for some days - and the segvs are gone now after several hours. Unfortunately, these are busy mailservers with multiple mails flowing through each second, so I cannot pinpoint / reproduce the segv precisely, and I cannot risk much experimenting on them. I hope this observations can help resolve the issue, anyway. So here's the dmesg output I saw: 2021-10-05T14:02:56.385231+02:00 phobos kernel: [ 1521.425408] opendkim[2552]: segfault at 10 ip 00007fd1f33256aa sp 00007fd1eb7fd640 error 4 in libunbound.so.8. 1.13[7fd1f328c000+cb000] 2021-10-05T14:02:56.385242+02:00 phobos kernel: [ 1521.425422] Code: fc 55 48 89 f5 53 48 83 ec 28 48 8b 9e 20 01 00 00 48 8b 43 10 4c 8b 38 4c 8b 50 08 48 8b 86 30 01 00 00 48 8b 80 b8 00 00 00 <4c> 8b 58 10 48 c7 86 30 01 00 00 00 00 00 00 83 fa fe 0f 84 46 02 ... 2021-10-05T14:38:48.875262+02:00 phobos kernel: [ 3674.051115] opendkim[11676]: segfault at 10 ip 00007f96636696aa sp 00007f965b7fd640 error 4 in libunbound.so.8.1.13[7f96635d0000+cb000] 2021-10-05T14:38:48.875273+02:00 phobos kernel: [ 3674.051128] Code: fc 55 48 89 f5 53 48 83 ec 28 48 8b 9e 20 01 00 00 48 8b 43 10 4c 8b 38 4c 8b 50 08 48 8b 86 30 01 00 00 48 8b 80 b8 00 00 00 <4c> 8b 58 10 48 c7 86 30 01 00 00 00 00 00 00 83 fa fe 0f 84 46 02 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c1 Togan Muftuoglu <toganm@dinamizm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |toganm@dinamizm.com --- Comment #1 from Togan Muftuoglu <toganm@dinamizm.com> --- I have the same issue and I noticed it only crashes when verifying the keys. For signing, it works. I have been using opendkim for a long time and it was working also flawlessly. The difference to Patrick is I am using i586 libunbound8-1.13.2-1.2.i586 unbound-1.13.2-1.2.i586 opendkim-2.11.0-5.4.i586 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c2 Ferdinand Thiessen <rpm@fthiessen.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|rpm@fthiessen.de |michael@stroeder.com --- Comment #2 from Ferdinand Thiessen <rpm@fthiessen.de> --- For me it looks like an issue within the new version of the unbound library -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 Michael Str�der <michael@stroeder.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|michael@stroeder.com |screening-team-bugs@suse.de -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c3 --- Comment #3 from Togan Muftuoglu <toganm@dinamizm.com> --- (In reply to Ferdinand Thiessen from comment #2)
For me it looks like an issue within the new version of the unbound library
As a workaround I have defined local unbound as Nameserver and it seems to hold so far. Nameservers 127.0.0.1 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c6 --- Comment #6 from Patrick Schaaf <patrick.schaaf@yalwa.com> --- Today I got around to test for this bug / crash another time. For both tw VERSION_ID 20211120 (libunbound8-1.13.2-2.1), and VERSION_ID 20211220 (libunbound8-1.14.0-1.1), I could see the segv. A forced --oldpackage downgrade to a safekept libunbound8-1.13.1-2.1, once more fixes the problem even with 20211220 state of the world, so is a viable workaround. This time, I got a coredump recorded from the 1.14.0 case, and for your kind consderation, attach its analysis here. It clearly shows a null pointer issue. ============================================= Core was generated by `/usr/sbin/opendkim -f -x /etc/opendkim/opendkim.conf'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f985ac9c38a in serviced_udp_callback (c=0x7f984c2ed7d0, arg=0x7f9840045180, error=-1, rep=0x0) at services/outside_network.c:3115 Downloading 0.11 MB source file /usr/src/debug/unbound-1.14.0-1.1.x86_64/services/outside_network.c 3115 struct port_if* pi = p->pc->pif; [Current thread is 1 (Thread 0x7f9857386640 (LWP 1239))] (gdb) bt #0 0x00007f985ac9c38a in serviced_udp_callback (c=0x7f984c2ed7d0, arg=0x7f9840045180, error=-1, rep=0x0) at services/outside_network.c:3115 #1 0x00007f985ac96c24 in outnet_send_wait_udp (outnet=outnet@entry=0x7f984c2bb820) at services/outside_network.c:1343 #2 0x00007f985ac97032 in outnet_udp_cb (c=0x7f984c2ed7d0, arg=0x7f984c2bb820, error=<optimized out>, reply_info=0x7f9857385840) at services/outside_network.c:1428 #3 0x00007f985ac917f5 in comm_point_udp_callback (fd=41, event=<optimized out>, arg=<optimized out>) at util/netevent.c:784 #4 0x00007f985a5a68b8 in event_persist_closure (ev=<optimized out>, base=0x7f984c281270) at /usr/src/debug/libevent-2.1.12-2.4.x86_64/event.c:1638 #5 event_process_active_single_queue (base=base@entry=0x7f984c281270, activeq=0x7f984c2816e0, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at /usr/src/debug/libevent-2.1.12-2.4.x86_64/event.c:1697 #6 0x00007f985a5a82bf in event_process_active (base=0x7f984c281270) at /usr/src/debug/libevent-2.1.12-2.4.x86_64/event.c:1798 #7 event_base_loop (base=0x7f984c281270, flags=0) at /usr/src/debug/libevent-2.1.12-2.4.x86_64/event.c:2040 #8 0x00007f985acbd270 in ub_event_base_dispatch (base=<optimized out>) at util/ub_event_pluggable.c:491 #9 comm_base_dispatch.isra.0 (b=<optimized out>, b=<optimized out>) at util/netevent.c:256 #10 0x00007f985ac1206f in libworker_dobg (arg=0x7f984c0cefc0) at libunbound/libworker.c:370 #11 0x00007f985aa30427 in start_thread (arg=<optimized out>) at pthread_create.c:435 #12 0x00007f985aab9810 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) print p $1 = (struct pending *) 0x7f98400452d0 (gdb) print p->pc $2 = (struct port_comm *) 0x0 (gdb) print *p $3 = {node = {parent = 0x7f9840044fd0, left = 0x7f985ad30dc0 <rbtree_null_node>, right = 0x7f985ad30dc0 <rbtree_null_node>, key = 0x7f98400452d0, color = 1 '\001'}, id = 25476, addr = {ss_family = 10, __ss_padding = "\000\065\000\000\000\000&\000\024\001\000\001\000\000\000\000\000\000\000\000\000C", '\000' <repeats 95 times>, __ss_align = 0}, addrlen = 28, pc = 0x0, timer = 0x7f98400453e0, cb = 0x7f985ac9c350 <serviced_udp_callback>, cb_arg = 0x7f9840045180, outnet = 0x7f984c2bb820, sq = 0x7f9840045180, next_waiting = 0x7f98400454f0, timeout = 376, pkt = 0x0, pkt_len = 0} -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c7 --- Comment #7 from Ferdinand Thiessen <rpm@fthiessen.de> --- Upstream issue: https://github.com/NLnetLabs/unbound/issues/588 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c8 --- Comment #8 from Ferdinand Thiessen <rpm@fthiessen.de> --- (In reply to Patrick Schaaf from comment #6)
Today I got around to test for this bug / crash another time.
For both tw VERSION_ID 20211120 (libunbound8-1.13.2-2.1), and VERSION_ID 20211220 (libunbound8-1.14.0-1.1), I could see the segv.
Can you try my patched version, does the crash still occurs? https://download.opensuse.org/repositories/home:/susnux:/branches:/server:/d... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c9 Patrick Schaaf <patrick.schaaf@yalwa.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS --- Comment #9 from Patrick Schaaf <patrick.schaaf@yalwa.com> --- Thank you Ferdinand! I test installed your rpm on one of my servers, and it has been holding up for 30 minutes already. Will report again in a day, whether it breaks, or not. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c10 --- Comment #10 from Patrick Schaaf <patrick.schaaf@yalwa.com> --- The test libunbound8-1.13.1-2.1.x86_64.rpm worked flawlessly so far; now rolled out to the other half of my server pair. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c11 --- Comment #11 from Patrick Schaaf <patrick.schaaf@yalwa.com> --- The slightly updated current tw libunbound8-1.14.0-1.2 still shows the coredumps / segv symptoms, killing opendkim. Ferdinand's test version 1.14.0-142.1 of 4 weeks ago (https://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c8), continues to work fine without crashes, for me. I just need to manually update it on each further tw update of my servers (normal "dup" downgrades) Would be nice if this could be fixed for real. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c12 --- Comment #12 from Ferdinand Thiessen <rpm@fthiessen.de> --- (In reply to Patrick Schaaf from comment #11)
Would be nice if this could be fixed for real.
Waiting for a maintainer to accept the SR: https://build.opensuse.org/request/show/948954 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c13 --- Comment #13 from Patrick Schaaf <patrick.schaaf@yalwa.com> --- Another test today with current tw libunbound8-1.14.0-1.3.x86_64.rpm - same coredumping as seen before; switched again to Ferdinand's test rpm, which continues to work. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c14 --- Comment #14 from Ferdinand Thiessen <rpm@fthiessen.de> --- (In reply to Patrick Schaaf from comment #13)
Another test today with current tw libunbound8-1.14.0-1.3.x86_64.rpm - same coredumping as seen before; switched again to Ferdinand's test rpm, which continues to work.
Still no response of the package maintainer, I tried to contacted the project maintainers for the submit request. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 Aaron Puchert <aaronpuchert@alice-dsl.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|lubos.kocman@suse.com |darin@darins.net -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c15 --- Comment #15 from Patrick Schaaf <patrick.schaaf@yalwa.com> --- Tested this here once more at current tw snapshot 20220516, with libunbound8-1.15.0-1.2 package - and after 6 hours of running, I'm happy to see no coredumps. So, this ticket is probably good to be closed. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c16 Ferdinand Thiessen <rpm@fthiessen.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED --- Comment #16 from Ferdinand Thiessen <rpm@fthiessen.de> --- https://build.opensuse.org/request/show/974922 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c17 --- Comment #17 from Togan Muftuoglu <toganm@dinamizm.com> --- (In reply to Patrick Schaaf from comment #15)
Tested this here once more at current tw snapshot 20220516, with libunbound8-1.15.0-1.2 package - and after 6 hours of running, I'm happy to see no coredumps.
So, this ticket is probably good to be closed.
My experience has been with the bug, it always happens when a bad dkim signature arrives then it is triggered. I had days with no issues with the previous versions and suddenly it would crash. So in my opionion 6 hrs is a bit enthusiastic approach but YMMV. Rather than opting to close the bug I would rather wait something like 72hrs than proceed. Of course reopening the bug or creating a new bug is always possible but then why close this one in a rush state could be raised as a question. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1191345 http://bugzilla.opensuse.org/show_bug.cgi?id=1191345#c18 --- Comment #18 from Ferdinand Thiessen <rpm@fthiessen.de> --- (In reply to Togan Muftuoglu from comment #17)
So in my opionion 6 hrs is a bit enthusiastic approach but YMMV. Rather than opting to close the bug I would rather wait something like 72hrs than proceed.
The origin of the bug was identified in libunbound and fixed with the latest release so this bug can be fixed (see submit request to factory above). -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com