In a situation like that if I understand it correctly you've got a tradeoff between default storage and plaintext storage -- size vs speed. You also have aggregation options that, without knowing more details, I would imagine would resolve this. This sounds like an infrastructure architecture problem and not necessarily a journald problem from what I've read here. Interesting stuff.

I wouldn't recommend in any distribution to use the default logging configuration for scale.

Configuration management and some planning are due there. Sure, it's slow. I can't blame journald for that, I would expect it to be slow.

Re: breaking everybody's setup by moving legacy logging tools to a secondary repo-- I would imagine a transition window would be needed. Suse dropped lilo eventually. I'm sure out there, somewhere under various rocks and lurching in untold caves are people still mad about it because their setup worked.

If it gets too much pushback we could always just call it DevLogOps or something and let the problem solve itself.

-C

On Sun, Dec 6, 2020 at 6:48 AM Stefan Seyfried <stefan.seyfried@googlemail.com> wrote:

On 06.12.20 12:23, Chris Punches wrote:
> I will also add that if it's taking 8 hours to pull logs from a server,
> there is a problem elsewhere or perhaps the log aggregation model in that
> environment should be revisited. That is not typical.

It's not "8 hours to pull the logs". "pulling the logs" would have been
a matter of seconds.

This were machines in the faster end of the spectrum that you could get
at that time. They were not particularly big, but had fast CPUs.
Something in the 256-512GB RAM, 244 cores class.
They had no persistent journal configured, so journal was running from
tmpfs.
"journalctl -b > file", file also on tmpfs or very fast storage took 8
hours and more, producing some GB (much less than 10GB IIRC) of output.

We noticed because SUSE's supportconfig tool took days to finish. It
does a "journalctl -b" to collect the journal... ;-)

So we created a service request and were told by SUSE professional
support that it is unfixable, because the database design of the journal
database just does not allow it to be read efficiently.

And this mirrors my previous experiences with journald, that it works
well on its developers notebook, but not very well on setups that differ
from that. Some years ago, persistent journal on rotating rust (aka HDD)
was unbearably slow due to fragmentation (somewhen around openSUSE 12.x
timeline).
--
Stefan Seyfried

"For a successful technology, reality must take precedence over
public relations, for nature cannot be fooled." -- Richard Feynman