About a month ago I sent out a proposal titled "Tumbleweed snapshot repositories" . Since then I spent some time building a basic prototype that could use some interested folks to play around with it.
For those not interested in reading my updates on the implementation and considerations the following are the relevant links.
- https://github.com/boombatower/tumbleweed-cli/wiki/Installation-and-usage - https://github.com/boombatower/tumbleweed-cli - https://github.com/boombatower/tumbleweed-snapshot
As of the time of this writing there are only two snapshots available.
- 20161201 - 20161129
The snapshot script is set to automatically check for the publishing of a new snapshot and should create them automatically.
Some open items from the proposal which have been decided in prototype are:
- directory structure: placement of snapshot identifier in URL - location for code: modify OBS / kiwi, etc. - mirror load: concerns about extra size and load on mirrors
The snapshot identifier is placed at the root level of the URL for a number of reasons.
1) does not require several URL patterns 2) works for all repo types 3) easily provides an entirely separate file tree for separate mirror network
Given the different structures of the main openSUSE Tumbleweed product repo vs a home/devel repo structure this avoid unnecessary complications.
/repositories/home:/boombatower:/snapshot/openSUSE_Tumbleweed/20161129/ /repositories/home:/boombatower:/snapshot/openSUSE_Tumbleweed_snapshot/ 20161129/
This structure also pairs well with the code strategy since it ended up making sense to keep the code out of the existing publishing workflows. As such the code does not have the context to understand the directory structure and it does not need to with this method. Since the snapshots are essentially backups of the published repositories it really does not make sense to mix into existing workflows. Additionally, in order to save space and implement the symlink strategy the script needs to compare the current snapshot against a previous snapshot which really does not fit in publishing workflow.
As mentioned, there are concerns about mirror load in terms of the extra space required to store the snapshots. Given a separate and complete directory tree is available it can be sync'd using a completely separate rysnc module or similar. A stub is provided for the current snapshot which redirects to the standard mirror infrastructure to avoid serving the files directly. Once the snapshot is no longer current it automatically falls back to serving the snapshotted files. This approach achieves the design goals while utilizing the existing infrastructure.
- 20161201 is released and redirects to standard mirrors
- 20161202 is released and redirects to standard mirrors - 20161201 serves snapshotted content
The end user URLs can thus always include the snapshot identifier without suffering from reduced performance (aside from the redirect).
The script looks at the openSUSE:Factory/snapshot repo for updates to the _product:openSUSE-release package . Once detected the sync'd repo tree is snapshotted (against previous snapshot) and the new snapshot is redirect to the standard mirrors. By using this as the trigger the snapshot is triggered before the rsync tree is updated so the snapshot contains the newest published packages before the new snapshot was published. This strategy also works for devel/home repos which may begin rebuilding as a result of the new snapshot or updates from maintainers. The rsync tree is continually sync'd inbetween snapshotting to ensure the latest versions are available when the snapshot is triggered. Obviously, this process could run on the same infrastructure as OBS to avoid the need to sync the tree. Having the trigger work completely outside of OBS itself means that third-party repositories can also be snapshotted without those parties having to set things up.
The Tumbleweed update repository is not snapshotted since it is intended to override snapshot packages. Obviously, in rare instances this could cause issues just as it can in normal operation. The update repo could be snapshotted and simply not utilized by default but available in the case users had issues and needed to utilize it.
The secondary goal of "locking" the repository or indicating when a repo is in flux is not currently achieved for the latest repo, but obviously works naturally for the snapshot repos. Based on some of the ideas in the proposal and additional thinking I have done on the subject it seems achievable, but focusing on the basic snapshot functionality first seems prudent.
A sidenote, the tumbleweed/iso/ directory is currently snapshotted as it can be handy to have older installation media for the times when they break. Given the limited resources of the server hosting the prototype I may disable it since it adds a couple tens of GB to each snapshot.
For those interesting in experimenting inside a controlled environment I'd recommend a docker container. The official opensuse:tumbleweed images is infrequently updated and as such will typically require a sizable update to get into the neighborhood of recent snapshots. Alternatively, my boombatower/opensuse:tumbleweed is rebuilt once weekly and as such should provide a recent starting point.
docker run -it boombatower/opensuse:tumbleweed /bin/bash
Add --rm if you would like it to automatically disappear after exiting.
I will be updating one of my machines using this prototype and should notice issues, but feel free to file issues on the github repos or respond to mail.
 https://lists.opensuse.org/opensuse-factory/2016-10/msg00591.html  https://build.opensuse.org/package/binaries/openSUSE:Factory/ _product:openSUSE-release?repository=snapshot