Re: [opensuse] A contrarian view of file systems, snapshots and SSD

25 Mar 2016

      Ooh a thread about development expediency and quality... I wonder how
much hate I'm going to get for sharing my feelings on this topic :)

On 25 March 2016 at 08:31, Xen  wrote:
<a lot of stuff about quality and development expediency>

You make some very interesting points which sound very reasonable, but
I disagree with most of them

To try and keep my points succicent, I'd summarise my points as follows:

1) The availability of rollback functionality for users
'systems-in-the-field' has no real world impact on the development
pace of release-based distributions
2) Software Quality in release-based distributions benefits from
longer release schedules
3) In the absence of software release schedules (ie. in a rolling
release), automated testing can provide proactive assurance that all
key use cases are not broken by rapid development processes
4) Tying that automated testing as a gate that must be passed before
any software releases slows development to ensure quality does not
suffer
5) Using that same automated testing gate improves software quality in
release-based distributions also
6) System rollback with a rolling release is an extra safety net which
helps ensure users don't suffer from any deficiencies in the automated
testing
7) System rollback with ANY distribution is, in my mind, essential
because there are thousands of ways a user or 3rd party software can
ruin a system.

I'll expand on these individually

1) Look at openSUSE. We've had snapper/btrfs in the distribution for
years. We've had it as a default since 13.2. And yet the trend for
openSUSE releases has been to *extend* the release cycle, from the
previous 8 months to the current 12 months. Just because we have
snapper by default doesn't mean we speed up software development

2) I think can be accepted without too much argument. The longer
something is developed, the more testing, the more time and effort we
expend on polishing it. The balancing act here is ensuring that when
you actually release the final product is still up-to-date enough that
it is interesting and useful to the people who want to use it. This
was one of the key motivators for the direction Leap is taking (More
stable/unchanging as a general goal, and using an Enterprise codebase
to achieve that) while we have Tumbleweed as the counter balance
without a release schedule.

3) openQA really is magic. Every single build of Tumbleweed gets
tested with over 100 different scenarios before it is released.
Software RAID, encrypted lvm, dual booting with windows 8,
filesystems, KDE, GNOME, lvm with RAID 1, memtest, minimal X, split
/usr, textmode, uefi, uefi with secure boot, uefi with USB booting,
updating from 12.x and 13.x, network installs, live CD's, and more are
tested automatically, with image and log based automatic assessment of
the results. ie. When Tumbleweed ships, openQA knows that every screen
it cares about *looks* the way we want it to look for a user, and
every command it typed *acts* the way we want it to act. That is a
broader coverage ensuring the functionality of our Linux distributions
than most corporate manual QA departments can manage with several
weeks of human testing..and openQA does it every day..sometimes twice
a day..

4) Tumbleweed uses openQA as an integrated part of the software
development process. Even before any new package hits any
distribution, incoming submit requests are 'staged', the Build Service
makes 'what-if' DVD's that contain the changeset from the submit
request, and then openQA does brief testing to ensure the OS is still
valid. If it fails, the package isn't allowed anywhere near any of our
distribution repos. Then only when it is accepted, full system
validation kicks off with the breadth I described in 3).

5) For Tumbleweed, such a process as 4) is necessary in order for the
rolling release to be viable, but it's proven itself so effective at
ensuring quality it's also used not only by Leap but also by the SUSE
Linux Enterprise development teams. Even with a 'traditional'
release-schedule which provides time for manual QA, it's beneficial
having a constant picture of hundreds of different installation,
configuration, and production scenarios. openQA can keep track of that
picture for every single development build, not only Milestones, like
Alphas and Betas which undergo manual testing, but all the
intermediate builds that occur as things rapidly change in each
distributions OBS Projects.

6) So, yes, in one sense you're right. Tumbleweed moves fast, and only
relies on magical automated testing, and so system rollback is a
pretty good idea for Tumbleweed users incase something slips past the
magical automated quality testing robot that is openQA.. that said, as
an avid Tumbleweed user, I have to admit that in the last 2 years the
only time I've had to use snapper is when *I* have screwed up my
machine doing stuff that *I* should not have done..so I am more
dangerous than a rapid rolling development model..

7) End of the day, the discussion of quality is actually mostly
irrelevant when it comes to a discussion about snapper

- There are ~2500 packages in SUSE Linux Enterprise
- There are ~7500 packages in openSUSE Leap and Tumbleweed
- There are *tens of thousands* of packages in OBS. These have no
testing. These are not integrated with our distributions. Many of them
exist in order to be developed for future versions of SLE, Leap and
Tumbleweed. There should be no expectation of 'quality' at all. They
build, they are published, and people use them. That is risky, and yet
people do it every day.
- There are *millions* of other open source third party packages.
These also have no testing, they are not integrated with our
distributions. They might not even go through the most basic of checks
which OBS does as part of a build. Lots of software languages now have
their own package managers and repositories which effectively can
'sideload' software onto your machine, bypassing your system package
manager (npm, gems, etc). People don't care, want to do something, and
use them anyway. That is even more risky, and yet people do it every
day.
- Offerings like CoreOS/atomic/Containerisation all try to offer
solutions to this, but the reality is they are far far away from being
a comprehensive fix. Tools like Machinery
http://machinery-project.org/ can identify unpackaged files and
changes to files from packages, and speaking from experience, the
situation out there in the real world is a messy, ugly place full of
local hacks, forgotten changes, rogue software, and mess
- In addition to the thousands of OBS and 3rd party packages out there
doing god knows what to the machine, at the end of the day, users are
human. And humans are fallible. People screw up.

Even a 100% perfect quality Linux distribution can be easily ruined by
one 3rd party programme or one wrong command typed by one mistaken
user. Good backups are great in disasters, but mistakes happen every
day. You need system rollback regardless of how good the software
quality is. The real world demands it.
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org

Re: [opensuse] A contrarian view of file systems, snapshots and SSD

Richard Brown