On 04/09/2022 20.14, Bernhard M. Wiedemann wrote:
Hi,
as you might know, pontifex2 aka download.o.o serves 4 major roles atm.
1. download.opensuse.org redirector for zypper repos 2. downloadcontent.opensuse.org fallback mirror 3. stage.opensuse.org rsync source server for major mirrors 4. repopusher to 6 mirrors
During August, my nagios monitoring has logged at least 20+10+20+19+22+70+30+10+20+10+10 minutes of high packet loss on download.o.o - that is 241m or 4h
The last download.o.o packet losses reported by my nagios were on 2022-09-13 and 2022-09-20 and each time only 10 minutes. I had created https://progress.opensuse.org/projects/opensuse-admin-wiki/wiki/Downloadcont... to relieve the host of task #2 Initially I thought, those few hundreds GB/d would not make much of an impact. Yet, it turned out that the distribution of the traffic had occasional really tall spikes up to 200 MByte/s (see attachment) Usually those spikes occurred after a new Leap 15.3 update-sle version came out when 900 users fetched that 120MB primary.xml.gz file before it reached the mirrors. https://github.com/openSUSE/open-build-service/issues/13094 could help there, but it probably won't be implemented for possibly invalid reasons. Some more testing/analysis around such partial zypper repodata transfers could help. Another way to improve these peaks would be to use some delayed publishing similar to tumbleweed, so the mirrors take that load. See pontifex2:/home/mirror/bin/publish_factory Or we just leave that as is, since downloadcontent2.o.o seems to be able to handle it fine within its traffic limits. Meanwhile the rsync traffic still sometimes reaches 3Gbit/s of our shared 4GBit/s uplink. I noticed that the excludes in repopush seem to not work. There are some largish repos that are updated often, but not requested much from mirrors. So we end up spending more traffic on rsyncing them out than would be needed to serve these files directly. I took a random sample of repos that took time to rsync: du -sm 49460 /srv/ftp/pub/opensuse/repositories/devel:/ARM:/Factory:/Contrib:/HEAD/images/ 14835 /srv/ftp/pub/opensuse/repositories/systemsmanagement:/Uyuni:/Master/images 59546 /srv/ftp/pub/opensuse/repositories/devel:/gcc:/next/ 118473 /srv/ftp/pub/opensuse/repositories/home:/Guillaume_G:/isp1760/images/ 7506 /srv/ftp/pub/opensuse/repositories/home:/johnny_law:/sle/images/ /var/log/apache2/download.opensuse.org/access_log shows only downloads for repo metadata in the 6h since log rotation. So should we push less of /repositories/ out to fewer places? The other improvement I have in mind is to find some well-connected host with 300-600 TB/month traffic and 3TB SSD that gets /tumbleweed /distribution /updates as soon as possible and to offer that as rsync source to mirror operators. This should take load off task #3 and thus leave more disk-IO and network bandwidth to the other tasks. Ciao Bernhard M.