On 2016-03-08 07:07, Andrei Borzenkov wrote:
08.03.2016 00:41, Carlos E. R. пишет:
How can I find what was lost?
You can't (i.e. - you could compare content of journal with content of syslogd stored messages ... unless you already filtered out something on syslogd side).
I have the "allmessages" file, unfiltered. Journal: Mar 08 09:52:42 Telcontar systemd-journal[316]: Forwarding to syslog missed 3 messages. Mar 08 09:52:42 Telcontar leafnode[32602]: connect from localhost (127.0.0.1) to localhost (127.0.0.1) (my fqdn: Telcontar.vali Mar 08 09:52:42 Telcontar leafnode[32601]: connect from localhost (127.0.0.1) to localhost (127.0.0.1) (my fqdn: Telcontar.vali Mar 08 09:55:01 Telcontar systemd[1]: Starting Session 1311 of user news. Mar 08 09:55:01 Telcontar systemd[1]: Started Session 1311 of user news. Mar 08 09:55:01 Telcontar systemd-journal[316]: Forwarding to syslog missed 6 messages. Mar 08 09:55:01 Telcontar systemd[1]: Starting Session 1312 of user wwwrun. Mar 08 09:55:01 Telcontar systemd[1]: Started Session 1312 of user wwwrun. Mar 08 09:55:01 Telcontar /USR/SBIN/CRON[32677]: (wwwrun) CMD ( php /srv/www/cacti/poller.php > /dev/null 2>&1) Mar 08 09:57:19 Telcontar freshclam[2731]: Received signal: wake up allmessages: <63>1 2016-03-08T09:52:42.509483+01:00 Telcontar leafnode 32601 - - >211 530 3 532 comp.os.linux.hardware group selected <63>1 2016-03-08T09:52:42.515981+01:00 Telcontar leafnode 32602 - - <GROUP comp.os.linux.announce <63>1 2016-03-08T09:52:42.516097+01:00 Telcontar leafnode 32602 - - marked group comp.os.linux.announce interesting <63>1 2016-03-08T09:52:42.516106+01:00 Telcontar leafnode 32602 - - markinterest: comp.os.linux.announce touched ctime <63>1 2016-03-08T09:52:42.516111+01:00 Telcontar leafnode 32602 - - >211 32 3 34 comp.os.linux.announce group selected <30>1 2016-03-08T09:55:01.871984+01:00 Telcontar systemd 1 - - Starting Session 1311 of user news. <30>1 2016-03-08T09:55:01.872380+01:00 Telcontar systemd 1 - - Started Session 1311 of user news. <30>1 2016-03-08T09:55:01.874449+01:00 Telcontar systemd 1 - - Starting Session 1312 of user wwwrun. <30>1 2016-03-08T09:55:01.874730+01:00 Telcontar systemd 1 - - Started Session 1312 of user wwwrun. <78>1 2016-03-08T09:55:01.875903+01:00 Telcontar 32677 - - (wwwrun) CMD ( php /srv/www/cacti/poller.php > /dev/null 2>&1) <22>1 2016-03-08T09:57:19.972468+01:00 Telcontar freshclam 2731 - - Received signal: wake up The "connect from localhost" messages are missing in syslog. In fact, it is the journal which is currently filtered, syslog has more messages, but some are missing, probably during the flurry of activity by leafnode. But not always: the messages lost at 09:55:01 happened at an idle moment, and logs are the same at both side.
Each Unix socket has fixed size buffer for sent data. If there is no space to store incoming message, normally sender would block. This is acceptable when you have many clients each sending messages to logging daemon independently; you can tolerate accidental sleep of some of them. But in case of systemd there is single process - journald - that is syslogd client and journald is not allowed to block indefinitely. So it is using non-blocking mode and simply discards messages if it cannot send them to syslogd (meaning - socket buffer is full).
But AFAIK no messages were lost before systemd started handling the log.
In case of rsyslog the solution is actually to use rsyslog journald support that pulls data itself.
Is that the default, or do I have to do something?
Workaround is to increase socket buffer size for journald - syslogd communication. E.g. current upstream systemd syslog.service sets
[Socket] ... ReceiveBuffer=8M
Is that accessible to admins (where), or is it a build option?
This should give more headroom to smooth bursts of activity.
I guess it should be. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)