[opensuse] spamd: single-core vs dual-core CPUs ?
(this was also posted to spamassassin-general, but I thought somebody here might just recognize the problem). I have a very unusual situation (I think): I am running the exact same system (it was cloned) on several single-core and dual-core systems. The only difference is in the hardware, i.e. some systems are dual-core. (ok, there are more differences than that). I am running spamd with maxchild=25 (the actual number in use is controlled externally). The single-core systems vary from two to five, sometimes six. Perfectly normal. On the dual-core systems, I never see more than two active children. Not normal. So I checked the logs - excerpt from a single-core system: spamd[3736]: spamd: handled cleanup of child pid 20940 due to SIGCHLD spamd[3736]: spamd: server successfully spawned child process, pid 20980 spamd[3736]: spamd: server successfully spawned child process, pid 20981 spamd[3736]: spamd: handled cleanup of child pid 20981 due to SIGCHLD spamd[3736]: spamd: handled cleanup of child pid 20980 due to SIGCHLD spamd[3736]: spamd: server successfully spawned child process, pid 21014 spamd[3736]: spamd: server successfully spawned child process, pid 21015 spamd[3736]: spamd: server successfully spawned child process, pid 21017 spamd[3736]: spamd: server successfully spawned child process, pid 21023 spamd[3736]: spamd: handled cleanup of child pid 21023 due to SIGCHLD spamd[3736]: spamd: handled cleanup of child pid 21017 due to SIGCHLD spamd[17254]: (child processing timeout at /usr/sbin/spamd line 1262, <GEN6653> line 27. spamd[3736]: spamd: handled cleanup of child pid 21015 due to SIGCHLD spamd[3736]: spamd: handled cleanup of child pid 21014 due to SIGCHLD excerpt from a dual-core system: spamd[3909]: spamd: server successfully spawned child process, pid 2092 spamd[3909]: spamd: server successfully spawned child process, pid 3439 spamd[3909]: spamd: child 3439 killed successfully spamd[3909]: spamd: child 32574 killed successfully spamd[3909]: spamd: server successfully spawned child process, pid 4249 spamd[3909]: spamd: server successfully spawned child process, pid 4250 spamd[3909]: spamd: server successfully spawned child process, pid 6249 spamd[3909]: spamd: server successfully spawned child process, pid 8242 spamd[6249]: (child processing timeout at /usr/sbin/spamd line 1262, <GEN6864> line 28. spamd[3909]: spamd: server successfully spawned child process, pid 10219 spamd[8242]: (child processing timeout at /usr/sbin/spamd line 1262, <GEN4734> line 28. spamd[3909]: spamd: child 8242 killed successfully spamd[3909]: spamd: child 10219 killed successfully spamd[3909]: spamd: server successfully spawned child process, pid 11091 spamd[3909]: spamd: server successfully spawned child process, pid 11092 spamd[11091]: (child processing timeout at /usr/sbin/spamd line 1262, <GEN1403> line 27. spamd[11091]: (child processing timeout at /usr/sbin/spamd line 1262, <GEN2867> line 28. spamd[3909]: spamd: server successfully spawned child process, pid 13141 spamd[3909]: spamd: child 13141 killed successfully Notice that the single-core system reports "handled cleanup of child pid nnnnn due to SIGCHLD" which is never seen on the dual-core system, and that the dual-core reports "child nnnnn killed successfully", which is never seen on the single-core system. What am I missing here? /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
(this was also posted to spamassassin-general, but I thought somebody here might just recognize the problem).
I have a very unusual situation (I think):
I am running the exact same system (it was cloned) on several single-core and dual-core systems. The only difference is in the hardware, i.e. some systems are dual-core. (ok, there are more differences than that).
I am running spamd with maxchild=25 (the actual number in use is controlled externally).
The single-core systems vary from two to five, sometimes six. Perfectly normal. On the dual-core systems, I never see more than two active children. Not normal.
So I checked the logs -
excerpt from a single-core system: ... Notice that the single-core system reports "handled cleanup of child pid nnnnn due to SIGCHLD" which is never seen on the dual-core system, and that the dual-core reports "child nnnnn killed successfully", which is never seen on the single-core system. What am I missing here?
What version are you running (OS, perl and spamd)? The obvious thing that springs to my mind is perl and threads (!horror!). Though I don't see any such code in spamd itself, perhaps it uses some libraries that do. Did you clone anything that is sensitive to the number of cores? Did you clone from a single-core or a multi-core master? In your dual-core logs, I'm surprised not to see "spamd: server hit by SIG$sig, restarting" because that child killed message appears in the restart_handler called after a SIGHUP. Have you tried running spamd with debugging enabled? Cheers, Dave -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Dave Howorth wrote:
excerpt from a single-core system: ... Notice that the single-core system reports "handled cleanup of child pid nnnnn due to SIGCHLD" which is never seen on the dual-core system, and that the dual-core reports "child nnnnn killed successfully", which is never seen on the single-core system. What am I missing here?
What version are you running (OS, perl and spamd)? The obvious thing that springs to my mind is perl and threads (!horror!). Though I don't see any such code in spamd itself, perhaps it uses some libraries that do.
openSUSE 11.0, perl 5.10.0, spamassassin 3.2.5 - the latest and greatest. kernel 2.6.25.5-1.1-pae.
Did you clone anything that is sensitive to the number of cores? Did you clone from a single-core or a multi-core master?
I cloned from the dual-core master.
In your dual-core logs, I'm surprised not to see "spamd: server hit by SIG$sig, restarting" because that child killed message appears in the restart_handler called after a SIGHUP.
Ah, my fault - those lines are present, I just grepped for the wrong thing. The "child killed" messages are caused by regular config reload (SIGHUP spamd). I'm more surprised not to see any "handled cleanup of child pid nnnnn due to SIGCHLD". It is as if the spamd master isn't getting the SIGCHLD.
Have you tried running spamd with debugging enabled?
Only on my test-system, which is single-core. I could take a production system out of service and try to reproduce, but I'd like an idea of what I should be looking for. thanks Dave. -- /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (2)
-
Dave Howorth
-
Per Jessen