Bernhard M. Wiedemann wrote:
Hi,
as you might know, pontifex2 aka download.o.o serves 4 major roles atm.
1. download.opensuse.org redirector for zypper repos 2. downloadcontent.opensuse.org fallback mirror 3. stage.opensuse.org rsync source server for major mirrors 4. repopusher to 6 mirrors
Six only because another five are waiting to be fixed by the "mirror admin" ...
During August, my nagios monitoring has logged at least 20+10+20+19+22+70+30+10+20+10+10 minutes of high packet loss on download.o.o - that is 241m or 4h
When I captured a tcpdump during such a troublesome moment, I found that 89% of it was rsync traffic and the total traffic sent was between 2.5 and 3 GBit/s on average, with 60 different mirror IPs. We only have a 4GBit/s link there shared for all of the Nuremberg SUSE Maxtorhof stuff.
That indicates that 3. and 4. above saturated the link and that contibuted to high packet loss that makes for a very poor user-experience of download.o.o users.
Yeah, that is likely true.
To solve this, we should try to separate these roles so that they conflict less with each other. The first thing we should try is to move out and split stage.o.o .
No. The first thing we should do is identify and discuss the issue. Second, I disagree with your up-front conclusion. IMHO, the issue is too much data and too little bandwidth. So I suggest we first look at: a) why is there too much data? b) why is there too little bandwidth?
Ideally, the data should be transferred only once to each continent via repopusher and then other mirrors pull from these primary mirrors.
Ideally, we would have sufficient bandwidth. If that is not forthcoming, we reduce traffic to accomodate our users as best we can. I personally think the repository push is largely useless - unless we can get it out to the mirror sites, in time.
As this is a major change to our download infra, I don't want to decide on that alone, so I'd appreciate your feedback if and how we should change things.
Again, we should identify the issue first of all. As we know, it is insufficient bandwidth. If we cannot get more bandwidth, we have to prioritise what we send where and when. I say skip the repositories push - it is persistently way behind, and despite my recent efforts to optimise it, it's not getting any better. -- Per Jessen, Zürich (19.1°C) Member, openSUSE Heroes