Hi,

The rsync problem could be solved if something else is implemented.
Zypper fetches packages from the closest place or where it is available.

What if the mirrors did a similar thing?

Let's say, there's a small mirror app that queries the official openSUSE servers and get the latest metadata.
With this metadata, the mirror app could fetch the RPM packages from another mirror, like Zypper does.
This could even be P2P, or just a dump app the queries an API to get the lastest medatata, and queries mirrorcache to get where the package is found.

I believe SLES RMT has a similar behavior.

Best,

Alexandre



From: Bernhard M. Wiedemann
Sent: Wednesday, September 7, 2022 6:52 PM
To: heroes@lists.opensuse.org
Subject: Re: improving download.o.o

I gathered some more data points:

/var/log/apache2/download.opensuse.org/2022/09/download.opensuse.org-20220904-access_log.xz

contained 200 GB worth of HTTP 200 responses
 - counted with
perl -ne 'm/ 200 (\d+) / and $s+=$1; END{print $s}'


/var/log/nginx/downloadcontent/2022/09/2022-09-04-access.log-20220905.xz
contained 720 GB worth of HTTP 200 and another 100 GB worth of 206
(Partial) responses for that day.

/proc/net/dev showed an average of 21TB/d sent over the 6d uptime -
equivalent of a continuous 2GBit/s - but of course it is not steady.

I used
grep -o "GET [^ ]* HTTP/1.* 200 [^ ]* " \
  2022-09-04-access.log-20220905 |
  sort|uniq -c|sort -n |
  tail -200 > most-requested-direct-downloads-size

To get the list of most requested (unmirrored) files (attached) and
found that a caching squid frontend could provide nice bandwidth savings
for tumbleweed and update repos - but probably less than those 720 GB/d
served.

I still think that the main problem is the rsync traffic.
You can use tcpdump -nr with the /tmp/dump*.pcap files and look for
rsync port 873, which made up 89% of it.

Apart from the practical tcpdump approach, there is also the math, that
says, dividing 3000 MBit/s by 70 mirrors leaves only 43 MBit/s to each
of them.


I think, the best approach would be to find or create a few
well-connected rsync mirrors that get pushed /tumbleweed and /update
first and from there, the other mirrors can sync.
This could increase latency of updates, because it needs 2 copies to
reach most mirrors, but OTOH, the copies should happen much faster,
because more total bandwidth will be available.


Ciao
Bernhard M.