Re: [opensuse-factory] O Factory - Where art Thou?

2 Dec 2013

      Robert Schweikert  writes:
...
2.) The staging approach
I can only speak from experience and thus this might sound a little
lame, sorry. I have seen two implementations of the staging model in
action at companies that produce large software suites. In both cases
I consider the approach as a failure.
The problem in both cases is that the number of staging
trees/branches/projects has an ever increasing slope, thus consuming
ever more manpower to manage the ever increasing number of staging
projects.
While the original problem of "how do we deal with unknown
adverse interactions between updates" remains unresolved. The
"solution" to this problem taken in one case was to have intermediate
staging trees where "known risky updates" were tested together. Yes,
staging trees upon staging trees. But this only solves the problem
superficially as the target tree will move ahead and thus the staging
tree by definition is always out of date. Unless the target tree is
frozen until a particular staging tree is merged. Anyway it is a maze
that requires potentially a lot of people.
The other problem with the staging model is that the "potentially
risky interactions" knowledge is an implicit set of interactions that
the staging tree managers happen to know. This is not expressed
anywhere and thus makes it difficult for other people to learn. We
have this problem today and from my point of view this will not be
resolved with more staging trees.
The staging model will not catch adverse interactions reliably. The
reason is that by definition the staging tree is always out of date,
unless the target tree is frozen and after one staging tree is
accepted all other staging trees get rebuilt. This is not conducive to
parallel development.
What I like in your little umm rant ;) is the notion of *interactions*
that need to be tested.

What I'd like to see in this whole discussion in addition is 'cadenced
flow' and 'integration decision'.

When you add these, the resulting staged flow will actually get both bug
fixes and package or subsystem upgrades available fast and in stable
quality, continously (bold claim) here's how:

Let's assume there is some code stable base, call it Factory.

The goal is to get updates in there, reliably, regularly, to get it to
the next level of being a stable code base.

For "leaf packages" that is simple: build the package, test it's
functionality, then release it.

For for lack of a better word "intermediate packages" (say libpng), you
start like above, testing the functionality, until from that perspective
it's good to release.  But then you also need to get it integrated with
"the rest of the world", (maybe 50-100 packages for libpng, 44 on my
system).

Then there is these "multi-scope packages", like NetworkManager or the
bluetooth stack, which affect several whole integration scopes, desktops
in this case.  they have interactions in two stages (first the desktop,
then all other desktops).

There is the "transversal packages", affecting almost everything, like
the toolchain.

And then there is packages affecting few other packages that nonetheless
have a lot of interactions that need to be tested (kernel, xorg, ...)

Now supposed there is a cascade of staging projects, which potentially
'release', say, every week or every other week (that's the "cadence").

They build a tree structure, something like this:

      a  t
      \u  o            \l              \l
       \t  o            \i              \e
        \o  l            \b              \a
  gcc    \-  s            \s              \f
   --------------------------------------------------------> Factory
             /                     /
            /\l                  n/
           /  \x                r/
         K/\G  \d              e/
        D/  \N  \e            k
       E/    \O
              \M
               \E

The number of nodes from the root (Factory) to a branch corresponds to
the interactions that need to be tested for what goes into that branch.

This gives growing rings of scope for interaction testing and
integration succes.  Successfull build and automatic tests are
necessary, sometimes even sufficient for interaction test and
integration success.  They propagate automatically, to give a
'tentative' next build.  That, however, does not affect the 'last known
good' build --- that last known good state remains available, too.

Thus at each branch, every week a decision can be made: is the
combination of the 'new' stuff good enough already to *pull* (!!) it in
together?  or do we --- for the combination! --- have to stick with what
we had so far, the last known good version?  btw, 'remote' breaks of the
last known good source version, because of some, say, toolchain upgrade,
clearly indicate said root cause, the toolchain upgrade, needs some
love, too.  Integration Manager, set priorities... what do you want in
this week's Factory?  maybe the new toolchain will not be there yet...

A GNOME release that needs a new NM or bluetooth may need a number of
cycles to get to a shape where it can be merged with KDE.

This will also lead to races.  For example a new gcc might race a
desktop integration, i.e. the new gcc works with the last known good
version of the desktops.  Then the new desktop integration will need to
do their homework and pull in the new gcc, and until then, the new gcc
will just build the last known good desktop.

Also a new GNOME will have at least to ensure integration issues are
resolved with the current stable KDE (last known good), and hopefully
the KDE guys are good citicens and are willing to spend the time to make
sure this works.  They will.  They need the GNOME guys to do the same a
few weeks later...

Sometimes for several weeks the integration master needs to pull some
old versions (the last known good combination), just because the new
combination is not ready yet.

Now how can small leaf package updates skip such a major barrier?
e.g. a new gimp?

At each junktion it is clear what needs to be tested.  If the new leaf
gtk+ application, gimp, can also be integrated with the 'last known good
version', the one that is still in Factory, then it can be integrated
into that and then thus moves on, ahead of the rest of GNOME, at the
next cycle.

So if we tilt the above tree and look at it sideways, it almost looks
like git integration:

                    new stuff                        merge success
         ---------------------------------------------*--------
        /          \                                   
       /            \pull new gimp                     
      /              \                last known good  
  --------------------*-------------------------------R.I.P.

Such a system of 'staging' or integration projects gives a clear flow
of both updates and upgrades into a well integrated and tested Factory,
which is 'released' at a weekly cadence.

The cadence also scopes the size of the integration projects: they
should be small enough to allow something like a weekly or at most
bi-weekly lock-step.

Beta users for some integration point or branch can use this a few days
ahead of the release, (zypper rr + zypper dup gets them sane again)

A new kernel can be available for pull for a longer while, until it has
enough love and testing to actually be pulled in as The Kernel, how to
manage kernel beta test is an own excercise when we get a stable
Factory.

Branch projects can also be used by those happy with a partial
integration (new KDE even if it breaks GNOME or vice versa), or
experimenting early with completely new feature sets (systemd, ...).

I believe this model overcomes some of the issues Robert has brought up
about the traditional 'staging' model.  It gives clear responsibilities
and a clear cadence.

No model can solve the problem *who* is going to do the integration
work, but this staging model at least clearly scopes and cadences what
needs to be done and when to keep the flow going.

my .02 EUR

S.

--
Susanne Oberhauser                     SUSE LINUX Products GmbH
+49-911-74053-574	               Maxfeldstraße 5
Processes and Infrastructure           90409 Nürnberg
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)
-- 
To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Re: [opensuse-factory] O Factory - Where art Thou?

Susanne Oberhauser-Hirschoff