
Hi, The rsync problem could be solved if something else is implemented. Zypper fetches packages from the closest place or where it is available. What if the mirrors did a similar thing? Let's say, there's a small mirror app that queries the official openSUSE servers and get the latest metadata. With this metadata, the mirror app could fetch the RPM packages from another mirror, like Zypper does. This could even be P2P, or just a dump app the queries an API to get the lastest medatata, and queries mirrorcache to get where the package is found. I believe SLES RMT has a similar behavior. Best, Alexandre ________________________________ From: Bernhard M. Wiedemann Sent: Wednesday, September 7, 2022 6:52 PM To: heroes@lists.opensuse.org Subject: Re: improving download.o.o I gathered some more data points: /var/log/apache2/download.opensuse.org/2022/09/download.opensuse.org-20220904-access_log.xz contained 200 GB worth of HTTP 200 responses - counted with perl -ne 'm/ 200 (\d+) / and $s+=$1; END{print $s}' /var/log/nginx/downloadcontent/2022/09/2022-09-04-access.log-20220905.xz contained 720 GB worth of HTTP 200 and another 100 GB worth of 206 (Partial) responses for that day. /proc/net/dev showed an average of 21TB/d sent over the 6d uptime - equivalent of a continuous 2GBit/s - but of course it is not steady. I used grep -o "GET [^ ]* HTTP/1.* 200 [^ ]* " \ 2022-09-04-access.log-20220905 | sort|uniq -c|sort -n | tail -200 > most-requested-direct-downloads-size To get the list of most requested (unmirrored) files (attached) and found that a caching squid frontend could provide nice bandwidth savings for tumbleweed and update repos - but probably less than those 720 GB/d served. I still think that the main problem is the rsync traffic. You can use tcpdump -nr with the /tmp/dump*.pcap files and look for rsync port 873, which made up 89% of it. Apart from the practical tcpdump approach, there is also the math, that says, dividing 3000 MBit/s by 70 mirrors leaves only 43 MBit/s to each of them. I think, the best approach would be to find or create a few well-connected rsync mirrors that get pushed /tumbleweed and /update first and from there, the other mirrors can sync. This could increase latency of updates, because it needs 2 copies to reach most mirrors, but OTOH, the copies should happen much faster, because more total bandwidth will be available. Ciao Bernhard M.

On 07/09/2022 19.22, Alexandre Vicenzi wrote:
The rsync problem could be solved if something else is implemented. Zypper fetches packages from the closest place or where it is available.
What if the mirrors did a similar thing?
For some mirrors that might be possible, but in general, these mirror servers might run various OSes operated by various people with little time and they might not even want custom software on there. One such an example of a P2P mirror is my IPFS-based /ipns/opensuse.zq1.de/ one, but I found that the DHT does not work well at that scale with go-ipfs. https://lizards.opensuse.org/2019/04/03/experimental-opensuse-mirror-via-ipf... Is there something better suited for this use-case? Ciao Bernhard M.
participants (2)
-
Alexandre Vicenzi
-
Bernhard M. Wiedemann