[opensuse] The spamd daemon dies unexpectedly halting mail processing.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I have problems with spamd children dying on me. Right now, I have 8 emails
stuck in the line. The processes are dead:
root 29170 0.0 2.4 30804 25280 ? SNs 09:52 0:02 /usr/sbin/spamd -d -c -r /var/run/spamd.pid
root 14295 0.0 0.0 0 0 ? ZN 13:34 0:00 \_ [spamd] <defunct>
root 14296 0.0 0.0 0 0 ? ZN 13:34 0:00 \_ [spamd] <defunct>
and spamc is stuck:
root 5062 0.0 0.1 6408 1048 ? Ss 03:01 0:00 /usr/lib/postfix/master
postfix 14791 0.0 0.1 6248 1820 ? S 13:35 0:00 \_ qmgr -l -t unix -u
postfix 14983 0.0 0.2 6380 2264 ? S 13:43 0:00 \_ local -t unix
cer 15699 0.0 0.0 2004 572 ? Ss 13:54 0:00 | \_ /usr/bin/procmail
cer 15709 0.0 0.0 2004 240 ? S 13:54 0:00 | \_ /usr/bin/procmail
cer 15710 0.0 0.0 2536 580 ? S 13:54 0:00 | \_ /usr/bin/spamc -s 350000
postfix 14993 0.0 0.2 6384 2272 ? S 13:43 0:00 \_ local -t unix
cer 15703 0.0 0.0 2008 576 ? Ss 13:54 0:00 | \_ /usr/bin/procmail
cer 15713 0.0 0.0 2008 244 ? S 13:54 0:00 | \_ /usr/bin/procmail
cer 15714 0.0 0.0 2536 580 ? S 13:54 0:00 | \_ /usr/bin/spamc -s 350000
postfix 15688 0.0 0.1 6204 1660 ? S 13:54 0:00 \_ pickup -l -t unix -u
postfix 15940 0.0 0.1 6208 1716 ? S 13:58 0:00 \_ showq -t unix -u
The log shows problems:
Dec 9 13:34:41 nimrodel spamd[14295]: spamd: processing message
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
* Carlos E. R.
I have problems with spamd children dying on me. Right now, I have 8 emails stuck in the line. The processes are dead:
root 29170 0.0 2.4 30804 25280 ? SNs 09:52 0:02 /usr/sbin/spamd -d -c -r /var/run/spamd.pid
[...]
now comes the problems
Dec 9 13:43:31 nimrodel spamd[14295]: prefork: sysread(8) failed after 300 secs at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 642. Dec 9 13:43:31 nimrodel spamd[14296]: prefork: sysread(9) failed after 300 secs at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 642.
does /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm exist? try adding "-L" for a time to see if network lookups are timing out. Could be dns or ipv6 problem??? - -- Patrick Shanahan Plainfield, Indiana, USA HOG # US1244711 http://wahoo.no-ip.org Photo Album: http://wahoo.no-ip.org/gallery2 Registered Linux User #207535 @ http://counter.li.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQFHXABdClSjbQz1U5oRAn2ZAJ9/zxCJpyCLhV987/1WdOC+MugSLwCgq352 HoCfuGHXDeUm4gx3+fTr/30= =UjrF -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Sunday 2007-12-09 at 09:49 -0500, Patrick Shanahan wrote:
Dec 9 13:43:31 nimrodel spamd[14295]: prefork: sysread(8) failed after 300 secs at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 642. Dec 9 13:43:31 nimrodel spamd[14296]: prefork: sysread(9) failed after 300 secs at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 642.
does /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm exist?
Certainly. The code is: retry_read: my $nbytes = $sock->sysread($buf, $toread); if (!defined $nbytes) { unless ((exists &Errno::EAGAIN && $! == &Errno::EAGAIN) || (exists &Errno::EWOULDBLOCK && $! == &Errno::EWOULDBLOCK)) { # an error that wasn't non-blocking I/O-related. that's serious return undef; } # ok, we didn't get it first time. we'll have to start using # select() and timeouts (which is slower). Don't warn just yet, # as it's quite acceptable in our design to have to "block" on # sysread()s here. my $now = time(); my $tout = $timeout; if (!defined $deadline) { # set this. it'll be close enough ;) $deadline = $now + $timeout; } elsif ($now > $deadline) { # timed out! report failure warn "prefork: sysread(".$sock->fileno.") failed after $timeout secs"; <==== return undef; } But I don't know perl, and even less spamassassin code.
try adding "-L" for a time to see if network lookups are timing out.
To where do I add that? :-O
Could be dns or ipv6 problem???
I have absolutely no idea... Another clue is that spamassassin is slower since some days ago, and uses much less CPU. It appears to stop waiting for some thing. Couldbe related to the above code. And I keep seeing errors in the warn log: Dec 9 16:34:45 nimrodel spamd[16817]: Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 323. Dec 9 16:34:45 nimrodel spamd[22472]: prefork: sysread(9) failed after 300 secs at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 642. Dec 9 16:34:45 nimrodel spamd[22473]: prefork: sysread(9) failed after 300 secs at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 642. Dec 9 16:34:45 nimrodel spamd[16817]: prefork: select returned error on server filehandle: Dec 9 16:34:45 nimrodel spamd[22472]: prefork: sysread(9) failed after 300 secs at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 642. Dec 9 16:34:45 nimrodel spamd[22473]: prefork: sysread(9) failed after 300 secs at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 642. Dec 9 16:34:45 nimrodel spamd[16817]: prefork: select returned error on server filehandle: Dec 9 16:37:13 nimrodel spamd[16817]: Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 323. Dec 9 16:37:13 nimrodel spamd[16817]: prefork: select returned error on server filehandle: Dec 9 16:37:13 nimrodel spamd[16817]: prefork: select returned error on server filehandle: Dec 9 16:39:19 nimrodel spamd[16817]: Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 323. Dec 9 16:39:19 nimrodel spamd[16817]: prefork: select returned error on server filehandle: Dec 9 16:39:19 nimrodel spamd[16817]: prefork: select returned error on server filehandle: That uninitialized value refers to this code: # errors; handle undef *or* -1 returned. do this before "errors on # the handle" below, since an error condition is signalled both via # a -1 return and a $eout bit. if (!defined $nfound || $nfound < 0) { if (exists &Errno::EINTR && $selerr == &Errno::EINTR) { # this happens if the process is signalled during the select(), # for example if someone sends SIGHUP to reload the configuration. # just return inmmediately dbg("prefork: select returned err $selerr, probably signalled"); return; } # if a child exits during that select() call, it generates a spurious # error, like this: # # Jan 29 12:53:17 dogma spamd[18518]: prefork: child states: BI # Jan 29 12:53:17 dogma spamd[18518]: spamd: handled cleanup of child pid 13101 due to SIGCHLD # Jan 29 12:53:17 dogma spamd[18518]: prefork: select returned -1! recovering: # # avoid by setting a boolean in the child_exited() callback and checking # it here. log $! just in case, though. if ($self->{child_just_exited} && $nfound == -1) { dbg("prefork: select returned -1 due to child exiting, ignored ($selerr)"); return; } warn "prefork: select returned ". (defined $nfound ? $nfound : "undef"). "! recovering: $selerr\n"; sleep 1; # avoid overload return; } # errors on the handle? # return them immediately, they may be from a SIGHUP restart signal if ($self->vec_all(\$eout, $self->{server_fileno})) { warn "prefork: select returned error on server filehandle: $selerr $!\n"; <==== return; } But I have no idea what is all about. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) iD8DBQFHXBC+tTMYHG2NR9URAtlJAKCAczDvNOPQD/J6yD2wsOw4HaJOLgCdHDsa EiB92JhG174m/NL1Zo8ctsQ= =E0FX -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
* Carlos E. R.
The Sunday 2007-12-09 at 09:49 -0500, Patrick Shanahan wrote:
try adding "-L" for a time to see if network lookups are timing out.
To where do I add that? :-O
yast sysconfig -> Network -> Mail -> Spamassassin -> SPAMD_ARGS BUT, "-L" is default :^( you can check from cl: ps aux | grep spamd /usr/sbin/spamd -d -c -L -u pat -r /var/run/spamd.pid maybe Randall knows, I've seen him offer perl scripts ?? gud luk, - -- Patrick Shanahan Plainfield, Indiana, USA HOG # US1244711 http://wahoo.no-ip.org Photo Album: http://wahoo.no-ip.org/gallery2 Registered Linux User #207535 @ http://counter.li.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQFHXC15ClSjbQz1U5oRAtEXAKCIQJYFoqwxDt3GHfdkMfZiqV+lKACeOqRo POCChhF7mX4HMBgN5VYEsw0= =l8Rf -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Sunday 2007-12-09 at 13:01 -0500, Patrick Shanahan wrote:
try adding "-L" for a time to see if network lookups are timing out.
To where do I add that? :-O
yast sysconfig -> Network -> Mail -> Spamassassin -> SPAMD_ARGS
Ah, /that/ -L :-)
BUT, "-L" is default :^(
I have it deactivated, local only mode catches less spam. I had an idea. I grepped my log files to find out where these messages started to appear; I grepped for "/usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm": zgrep -c "/usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm" /var/log/mail.debug-2007* | less counting the messages per file. I have them as early as: /var/log/mail.debug-20070217.bz2:0 /var/log/mail.debug-20070219.bz2:0 /var/log/mail.debug-20070225.bz2:6 /var/log/mail.debug-20070304.bz2:14 /var/log/mail.debug-20070311.bz2:2 /var/log/mail.debug-20070318.bz2:2 ... /var/log/mail.debug-20071028.gz:18 /var/log/mail.debug-20071103.bz2:15 /var/log/mail.debug-20071105.bz2:0 <=== upgraded to O.S: 10.3 here /var/log/mail.debug-20071106.bz2:0 /var/log/mail.debug-20071107.bz2:0 /var/log/mail.debug-20071108.bz2:0 You see that they stopped when I upgraded to 10.3, but they restart on this file: /var/log/mail.debug-20071207.bz2:16 They started "Dec 7 15:29:07". I was hopping to see if some thing I installed then could have an effect... but I don't know. This is what I installed Thu Dec 06 2007, from 08:16:38 to 08:55:37 PM CET - it's wonderfull what you can learn from the rpm database: BUILDTIME:day NAME VERSION Sat Nov 24 2007 htdig 3.2.0b6-110.2 Fri Nov 30 2007 glib2 2.14.1-4.2 Sat Nov 24 2007 release-notes 10.3.19-0.1 Fri Nov 23 2007 alsa 1.0.14-31.2 Fri Nov 23 2007 libasound2 1.0.14-31.2 Fri Nov 23 2007 libcom_err2 1.40.2-20.2 Fri Nov 23 2007 libuuid1 1.40.2-20.2 Mon Nov 26 2007 kernel-source 2.6.22.13-0.3 Mon Nov 26 2007 kernel-default 2.6.22.13-0.3 Fri Nov 30 2007 glib2-devel 2.14.1-4.2 Fri Nov 23 2007 alsa-devel 1.0.14-31.2 Fri Nov 30 2007 OpenOffice_org 2.3.0.1.2-10.3 Fri Nov 23 2007 libext2fs2 1.40.2-20.2 Fri Nov 23 2007 libuuid-devel 1.40.2-20.2 Mon Nov 26 2007 kernel-syms 2.6.22.13-0.3 Wed Nov 28 2007 gnome-screensaver 2.20.0-6.2 Wed Nov 28 2007 xorg-x11 7.2-135.4 Fri Nov 23 2007 libblkid1 1.40.2-20.2 Fri Nov 23 2007 libcom_err-devel 1.40.2-20.2 Fri Nov 30 2007 OpenOffice_org-base 2.3.0.1.2-10.3 Fri Nov 30 2007 OpenOffice_org-draw 2.3.0.1.2-10.3 Fri Nov 30 2007 OpenOffice_org-gnome 2.3.0.1.2-10.3 Fri Nov 30 2007 OpenOffice_org-impress 2.3.0.1.2-10.3 Fri Nov 30 2007 OpenOffice_org-kde 2.3.0.1.2-10.3 Fri Nov 30 2007 OpenOffice_org-math 2.3.0.1.2-10.3 Fri Nov 30 2007 OpenOffice_org-mono 2.3.0.1.2-10.3 Fri Nov 30 2007 OpenOffice_org-officebean 2.3.0.1.2-10.3 Fri Nov 30 2007 OpenOffice_org-pyuno 2.3.0.1.2-10.3 Fri Nov 30 2007 OpenOffice_org-writer 2.3.0.1.2-10.3 Fri Nov 23 2007 libext2fs-devel 1.40.2-20.2 Fri Nov 30 2007 OpenOffice_org-calc 2.3.0.1.2-10.3 Fri Nov 23 2007 e2fsprogs 1.40.2-20.2 Fri Nov 23 2007 libblkid-devel 1.40.2-20.2 Fri Nov 30 2007 OpenOffice_org-mailmerge 2.3.0.1.2-10.3 Fri Nov 30 2007 OpenOffice_org-filters 2.3.0.1.2-10.3 Fri Nov 23 2007 e2fsprogs-devel 1.40.2-20.2 Do you see some rpm related to perl or spamassassin there? Perhaps glibc? - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) iD8DBQFHXHN0tTMYHG2NR9URAiS3AJ4vZs5USslK1cYFfR7jtwHu7VhDjQCfX6Cx lhanhdB9j8JJG0JlE2MPZXE= =JLzI -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
* Carlos E. R.
Do you see some rpm related to perl or spamassassin there? Perhaps glibc?
I see nada :^( .... - -- Patrick Shanahan Plainfield, Indiana, USA HOG # US1244711 http://wahoo.no-ip.org Photo Album: http://wahoo.no-ip.org/gallery2 Registered Linux User #207535 @ http://counter.li.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQFHXHd0ClSjbQz1U5oRAmKGAKCOQouR0xBwABBns3HnDR0Z6j+H6gCcC6eI wL3fwWIy6rV/1ds0fu27iK4= =IvLU -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Carlos E. R. wrote:
The Sunday 2007-12-09 at 09:49 -0500, Patrick Shanahan wrote:
Dec 9 13:43:31 nimrodel spamd[14295]: prefork: sysread(8) failed
<snip>
retry_read: my $nbytes = $sock->sysread($buf, $toread);
if (!defined $nbytes) { unless ((exists &Errno::EAGAIN && $! == &Errno::EAGAIN) || (exists &Errno::EWOULDBLOCK && $! == &Errno::EWOULDBLOCK)) { # an error that wasn't non-blocking I/O-related. that's serious return undef; }
# ok, we didn't get it first time. we'll have to start using # select() and timeouts (which is slower). Don't warn just yet, # as it's quite acceptable in our design to have to "block" on # sysread()s here.
my $now = time(); my $tout = $timeout; if (!defined $deadline) { # set this. it'll be close enough ;) $deadline = $now + $timeout; } elsif ($now > $deadline) { # timed out! report failure warn "prefork: sysread(".$sock->fileno.") failed after $timeout secs"; <==== return undef; }
But I don't know perl, and even less spamassassin code.
<snip>
Dec 9 16:34:45 nimrodel spamd[16817]: Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 323. Dec 9 16:34:45 nimrodel spamd[22472]: prefork: sysread(9) failed after 300 secs at
<snip>
That uninitialized value refers to this code:
# errors; handle undef *or* -1 returned. do this before "errors on # the handle" below, since an error condition is signalled both via # a -1 return and a $eout bit. if (!defined $nfound || $nfound < 0) { if (exists &Errno::EINTR && $selerr == &Errno::EINTR) { # this happens if the process is signalled during the select(), # for example if someone sends SIGHUP to reload the configuration. # just return inmmediately dbg("prefork: select returned err $selerr, probably signalled"); return; }
# if a child exits during that select() call, it generates a spurious # error, like this: # # Jan 29 12:53:17 dogma spamd[18518]: prefork: child states: BI # Jan 29 12:53:17 dogma spamd[18518]: spamd: handled cleanup of child pid 13101 due to SIGCHLD # Jan 29 12:53:17 dogma spamd[18518]: prefork: select returned -1! recovering: # # avoid by setting a boolean in the child_exited() callback and checking # it here. log $! just in case, though. if ($self->{child_just_exited} && $nfound == -1) { dbg("prefork: select returned -1 due to child exiting, ignored ($selerr)"); return; }
warn "prefork: select returned ". (defined $nfound ? $nfound : "undef"). "! recovering: $selerr\n";
sleep 1; # avoid overload return; }
# errors on the handle? # return them immediately, they may be from a SIGHUP restart signal if ($self->vec_all(\$eout, $self->{server_fileno})) { warn "prefork: select returned error on server filehandle: $selerr $!\n"; <==== return; }
But I have no idea what is all about.
Out of curiosity had a look at my copy of this Perl module. Firstly, this code does not appear in the same place in my version. Could tell more if the line involved is highlighted in the second snippet. Secondly, the code is reported a little out of context. Both are in class methods of a class that seems to be at the heart of a Perl fork. I think you will need to raise this the spamassassin people as this looks as if something much earlier is returning an undefined result. By the time it is hitting the read operation the underlying call is broken. Like many scripting languages a runtime error is detected when the result is used rather than when the error actually occurred and it is extremely difficult to figure out what the real problem is with code alone.
-- Cheers, Carlos E. R.
- -- ============================================================================== I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. Bjarne Stroustrup ============================================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHXQqaasN0sSnLmgIRAl8qAJ4mZQW6wwuurf7GZ4aeGJbLFZL+GgCeMvL/ hAWmGeKwZKroaqQK8TcGViY= =h0om -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Monday 2007-12-10 at 09:44 -0000, G T Smith wrote: ...
Out of curiosity had a look at my copy of this Perl module. Firstly, this code does not appear in the same place in my version. Could tell more if the line involved is highlighted in the second snippet. Secondly, the code is reported a little out of context. Both are in class methods of a class that seems to be at the heart of a Perl fork.
I marked both lines involved with a "<====", with a bit of context. The file comes from "perl-spamassassin-3.2.3-10...rpm, which is the one that comes with opensuse 10.3. I can make guesses, but I don't really know what the code does or doesn't.
I think you will need to raise this the spamassassin people as this looks as if something much earlier is returning an undefined result. By the time it is hitting the read operation the underlying call is broken.
For the momment I opened a bugzilla with Novell. I'm unsure of what's thre procedure with the spamassassin people, and wether they would accept a bug which is not from their more recent version.
Like many scripting languages a runtime error is detected when the result is used rather than when the error actually occurred and it is extremely difficult to figure out what the real problem is with code alone.
I suppose so... Some kind of trace would help, I suppose. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) iD8DBQFHXV3ztTMYHG2NR9URAumAAJ9d7/jplqHC1IBoMG9wMsv9YJLGsgCfcD9g +nJCAEcnV2VfijVgIN3MPXY= =yBS7 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (3)
-
Carlos E. R.
-
G T Smith
-
Patrick Shanahan