Robert Schweikert
2.) The staging approach I can only speak from experience and thus this might sound a little lame, sorry. I have seen two implementations of the staging model in action at companies that produce large software suites. In both cases I consider the approach as a failure.
The problem in both cases is that the number of staging trees/branches/projects has an ever increasing slope, thus consuming ever more manpower to manage the ever increasing number of staging projects.
While the original problem of "how do we deal with unknown adverse interactions between updates" remains unresolved. The "solution" to this problem taken in one case was to have intermediate staging trees where "known risky updates" were tested together. Yes, staging trees upon staging trees. But this only solves the problem superficially as the target tree will move ahead and thus the staging tree by definition is always out of date. Unless the target tree is frozen until a particular staging tree is merged. Anyway it is a maze that requires potentially a lot of people.
The other problem with the staging model is that the "potentially risky interactions" knowledge is an implicit set of interactions that the staging tree managers happen to know. This is not expressed anywhere and thus makes it difficult for other people to learn. We have this problem today and from my point of view this will not be resolved with more staging trees.
The staging model will not catch adverse interactions reliably. The reason is that by definition the staging tree is always out of date, unless the target tree is frozen and after one staging tree is accepted all other staging trees get rebuilt. This is not conducive to parallel development.
What I like in your little umm rant ;) is the notion of *interactions* that need to be tested. What I'd like to see in this whole discussion in addition is 'cadenced flow' and 'integration decision'. When you add these, the resulting staged flow will actually get both bug fixes and package or subsystem upgrades available fast and in stable quality, continously (bold claim) here's how: Let's assume there is some code stable base, call it Factory. The goal is to get updates in there, reliably, regularly, to get it to the next level of being a stable code base. For "leaf packages" that is simple: build the package, test it's functionality, then release it. For for lack of a better word "intermediate packages" (say libpng), you start like above, testing the functionality, until from that perspective it's good to release. But then you also need to get it integrated with "the rest of the world", (maybe 50-100 packages for libpng, 44 on my system). Then there is these "multi-scope packages", like NetworkManager or the bluetooth stack, which affect several whole integration scopes, desktops in this case. they have interactions in two stages (first the desktop, then all other desktops). There is the "transversal packages", affecting almost everything, like the toolchain. And then there is packages affecting few other packages that nonetheless have a lot of interactions that need to be tested (kernel, xorg, ...) Now supposed there is a cascade of staging projects, which potentially 'release', say, every week or every other week (that's the "cadence"). They build a tree structure, something like this: a t \u o \l \l \t o \i \e \o l \b \a gcc \- s \s \f --------------------------------------------------------> Factory / / /\l n/ / \x r/ K/\G \d e/ D/ \N \e k E/ \O \M \E The number of nodes from the root (Factory) to a branch corresponds to the interactions that need to be tested for what goes into that branch. This gives growing rings of scope for interaction testing and integration succes. Successfull build and automatic tests are necessary, sometimes even sufficient for interaction test and integration success. They propagate automatically, to give a 'tentative' next build. That, however, does not affect the 'last known good' build --- that last known good state remains available, too. Thus at each branch, every week a decision can be made: is the combination of the 'new' stuff good enough already to *pull* (!!) it in together? or do we --- for the combination! --- have to stick with what we had so far, the last known good version? btw, 'remote' breaks of the last known good source version, because of some, say, toolchain upgrade, clearly indicate said root cause, the toolchain upgrade, needs some love, too. Integration Manager, set priorities... what do you want in this week's Factory? maybe the new toolchain will not be there yet... A GNOME release that needs a new NM or bluetooth may need a number of cycles to get to a shape where it can be merged with KDE. This will also lead to races. For example a new gcc might race a desktop integration, i.e. the new gcc works with the last known good version of the desktops. Then the new desktop integration will need to do their homework and pull in the new gcc, and until then, the new gcc will just build the last known good desktop. Also a new GNOME will have at least to ensure integration issues are resolved with the current stable KDE (last known good), and hopefully the KDE guys are good citicens and are willing to spend the time to make sure this works. They will. They need the GNOME guys to do the same a few weeks later... Sometimes for several weeks the integration master needs to pull some old versions (the last known good combination), just because the new combination is not ready yet. Now how can small leaf package updates skip such a major barrier? e.g. a new gimp? At each junktion it is clear what needs to be tested. If the new leaf gtk+ application, gimp, can also be integrated with the 'last known good version', the one that is still in Factory, then it can be integrated into that and then thus moves on, ahead of the rest of GNOME, at the next cycle. So if we tilt the above tree and look at it sideways, it almost looks like git integration: new stuff merge success ---------------------------------------------*-------- / \ / \pull new gimp / \ last known good --------------------*-------------------------------R.I.P. Such a system of 'staging' or integration projects gives a clear flow of both updates and upgrades into a well integrated and tested Factory, which is 'released' at a weekly cadence. The cadence also scopes the size of the integration projects: they should be small enough to allow something like a weekly or at most bi-weekly lock-step. Beta users for some integration point or branch can use this a few days ahead of the release, (zypper rr + zypper dup gets them sane again) A new kernel can be available for pull for a longer while, until it has enough love and testing to actually be pulled in as The Kernel, how to manage kernel beta test is an own excercise when we get a stable Factory. Branch projects can also be used by those happy with a partial integration (new KDE even if it breaks GNOME or vice versa), or experimenting early with completely new feature sets (systemd, ...). I believe this model overcomes some of the issues Robert has brought up about the traditional 'staging' model. It gives clear responsibilities and a clear cadence. No model can solve the problem *who* is going to do the integration work, but this staging model at least clearly scopes and cadences what needs to be done and when to keep the flow going. my .02 EUR S. -- Susanne Oberhauser SUSE LINUX Products GmbH +49-911-74053-574 Maxfeldstraße 5 Processes and Infrastructure 90409 Nürnberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org