[opensuse-factory] Tumbleweed snapshot repositories prototype

3 Dec 2016

      About a month ago I sent out a proposal titled "Tumbleweed snapshot
repositories" [1]. Since then I spent some time building a basic prototype 
that
could use some interested folks to play around with it.

For those not interested in reading my updates on the implementation and
considerations the following are the relevant links.

- https://github.com/boombatower/tumbleweed-cli/wiki/Installation-and-usage
- https://github.com/boombatower/tumbleweed-cli
- https://github.com/boombatower/tumbleweed-snapshot

As of the time of this writing there are only two snapshots available.

- 20161201
- 20161129

The snapshot script is set to automatically check for the publishing of a new
snapshot and should create them automatically.

Some open items from the proposal which have been decided in prototype are:

- directory structure: placement of snapshot identifier in URL
- location for code: modify OBS / kiwi, etc.
- mirror load: concerns about extra size and load on mirrors

The snapshot identifier is placed at the root level of the URL for a number of
reasons.

1) does not require several URL patterns
2) works for all repo types
3) easily provides an entirely separate file tree for separate mirror network

For example:

  /tumbleweed/repo/oss/

Becomes:

  /20161129/tumbleweed/repo/oss/

Given the different structures of the main openSUSE Tumbleweed product repo vs
a home/devel repo structure this avoid unnecessary complications.

  /repositories/home:/boombatower:/snapshot/openSUSE_Tumbleweed/

Becomes:

  /20161129/repositories/home:/boombatower:/snapshot/openSUSE_Tumbleweed/

Instead of:

  /repositories/home:/boombatower:/snapshot/openSUSE_Tumbleweed/20161129/
  /repositories/home:/boombatower:/snapshot/openSUSE_Tumbleweed_snapshot/
20161129/

This structure also pairs well with the code strategy since it ended up making
sense to keep the code out of the existing publishing workflows. As such the
code does not have the context to understand the directory structure and it 
does
not need to with this method. Since the snapshots are essentially backups of 
the
published repositories it really does not make sense to mix into existing
workflows. Additionally, in order to save space and implement the symlink
strategy the script needs to compare the current snapshot against a previous
snapshot which really does not fit in publishing workflow.

As mentioned, there are concerns about mirror load in terms of the extra space
required to store the snapshots. Given a separate and complete directory tree 
is
available it can be sync'd using a completely separate rysnc module or 
similar.
A stub is provided for the current snapshot which redirects to the standard
mirror infrastructure to avoid serving the files directly. Once the snapshot 
is
no longer current it automatically falls back to serving the snapshotted 
files.
This approach achieves the design goals while utilizing the existing
infrastructure.

For example:

- 20161201 is released and redirects to standard mirrors

- 20161202 is released and redirects to standard mirrors
- 20161201 serves snapshotted content

The end user URLs can thus always include the snapshot identifier without
suffering from reduced performance (aside from the redirect).

The script looks at the openSUSE:Factory/snapshot repo for updates to the
_product:openSUSE-release package [2]. Once detected the sync'd repo tree is
snapshotted (against previous snapshot) and the new snapshot is redirect to 
the
standard mirrors. By using this as the trigger the snapshot is triggered 
before
the rsync tree is updated so the snapshot contains the newest published 
packages
before the new snapshot was published. This strategy also works for devel/home
repos which may begin rebuilding as a result of the new snapshot or updates 
from
maintainers. The rsync tree is continually sync'd inbetween snapshotting to
ensure the latest versions are available when the snapshot is triggered.
Obviously, this process could run on the same infrastructure as OBS to avoid 
the
need to sync the tree. Having the trigger work completely outside of OBS 
itself
means that third-party repositories can also be snapshotted without those
parties having to set things up.

The Tumbleweed update repository is not snapshotted since it is intended to
override snapshot packages. Obviously, in rare instances this could cause 
issues
just as it can in normal operation. The update repo could be snapshotted and
simply not utilized by default but available in the case users had issues and
needed to utilize it.

The secondary goal of "locking" the repository or indicating when a repo is in
flux is not currently achieved for the latest repo, but obviously works
naturally for the snapshot repos. Based on some of the ideas in the proposal 
and
additional thinking I have done on the subject it seems achievable, but 
focusing
on the basic snapshot functionality first seems prudent.

A sidenote, the tumbleweed/iso/ directory is currently snapshotted as it can 
be
handy to have older installation media for the times when they break. Given 
the
limited resources of the server hosting the prototype I may disable it since 
it
adds a couple tens of GB to each snapshot.

For those interesting in experimenting inside a controlled environment I'd
recommend a docker container. The official opensuse:tumbleweed images is
infrequently updated and as such will typically require a sizable update to 
get
into the neighborhood of recent snapshots. Alternatively, my
boombatower/opensuse:tumbleweed is rebuilt once weekly and as such should
provide a recent starting point.

  docker run -it boombatower/opensuse:tumbleweed /bin/bash

Add --rm if you would like it to automatically disappear after exiting.

I will be updating one of my machines using this prototype and should notice
issues, but feel free to file issues on the github repos or respond to mail.

[1] https://lists.opensuse.org/opensuse-factory/2016-10/msg00591.html
[2] https://build.opensuse.org/package/binaries/openSUSE:Factory/
_product:openSUSE-release?repository=snapshot

Have fun!

-- 
Jimmy

[opensuse-factory] Tumbleweed snapshot repositories prototype

Jimmy Berry