On Thu, Jul 22, 2010 at 01:38:50PM +0200, Carsten Otto wrote:
On Thu, Jul 22, 2010 at 01:23:01PM +0200, Keld Simonsen wrote:
Personally I would like to exploit the use of bittorrent seeds hosted by mirrors - maybe that would scale well, and give quite evenly spread bandwidth use on the mirror infrastructure. Maybe gwdg.de can tell us more, and whether there could be some generalized way to do this in a distributed way. And also about performance - I have had bad experience with a lot of HTTP connections and therefore I redirected all .iso downloads to FTP.
Seeding via BitTorrent is something we at ftp.halifax.rwth-aachen.de are doing for a long time now. With every single big release also advertising BitTorrent - OpenSUSE and Ubuntu come to mind - we were able to hit the limits of our line (1 GBit/sec in the past, 1,7 GBit/sec in the meantime and 2 GBit/sec with OpenSUSE 11.3). With OpenSUSE 11.3 we also added a few additional machines, each serving a single ISO file. This resulted in about 3 GBit/sec just via BitTorrent at peaks.
With enough RAM, which should not be a problem for proper mirrors, the CD/DVD files are cached and the random accesses caused by BitTorrent do not harm at all. By referencing the very same file for HTTP, FTP and BitTorrent you get BitTorrent "for free" (disregarding CPU usage).
However, if the mirror can only fit one or two DVDs into RAM, also handling normal HTTP traffic accessing other files could really hurt performance. Here, the disk subsystem is trashed with random read accesses. In fact, we needed to put the image into RAM (using "cat") before starting rtorrent, because it started seeding the uncached file right from the start and was unresponsive because of that.
We made very good experiences with rtorrent. You may need to spawn several instances for different files if you hit CPU limits, since rtorrent is not multi threaded, yet.
Sounds interesting! So this is dependent on all isos in RAM, to avoid random disk access to clutter up the disks... Would it be possible to eg use bigger block sizes for BT so that the random access would not be so performance impacting? I was thinking about maybe block sizes of 1 MB, without investigating what BT currently does, and whether it is easy to change block sizes wth BT.
My server does only have 1.5 TB of RAM, and I think what you suggest would need much more RAM. Anyway, RAM is cheap these days. What would be the recommended amount of RAM? And would it scale? I was thinking about having most of the mirror data available via BT, in an automated and distributed way, and then having all data in RAM seems futile, I have about 5 TB data on my mirror server. I would think bigger sizes could be a better way to scale.
Anyway - again without consulting the net, I think it could be useful with a wiki article on mirroring software and practices, eg on wikipedia, which could be a neutral place to gather the technology and experience of distribution and mirror maintainers on this subject.
best regards Keld