Re: [opensuse] A contrarian view of file systems, snapshots and SSD

25 Mar 2016

      Hi thank you mister Brown Robert ;-).

I would first like to point out what I think about or to your short list 
of points, since that is easiest to refute or talk about in the context 
of what I had said before or the argument I made.

Op 25-3-2016 om 09:41 schreef Richard Brown:
...
1) The availability of rollback functionality for users
'systems-in-the-field' has no real world impact on the development
pace of release-based distributions
2) Software Quality in release-based distributions benefits from
longer release schedules
3) In the absence of software release schedules (ie. in a rolling
release), automated testing can provide proactive assurance that all
key use cases are not broken by rapid development processes
4) Tying that automated testing as a gate that must be passed before
any software releases slows development to ensure quality does not
suffer
5) Using that same automated testing gate improves software quality in
release-based distributions also
6) System rollback with a rolling release is an extra safety net which
helps ensure users don't suffer from any deficiencies in the automated
testing
7) System rollback with ANY distribution is, in my mind, essential
because there are thousands of ways a user or 3rd party software can
ruin a system.
First of course at this point you equate robustness to the robustness of 
the distribution as a whole and as an integrated thing, but I'm not sure 
you're also talking about the software packages individually, since 
normally you'd not be greatly responsible for them.

1. I had not ventured that rollback had any impact on pace. In fact, I 
had said that a sufficiently fine-grained and usable system (I do not 
think Snapper is really usable for that) would increases the general 
pleasure people could have in using Linux, which might enhance the user 
experience of the environment, which in turn could generally make people 
more productive and happy. I was making no allusions to the actual PACE 
of any distribution's development, as you would really need to specify 
what you mean by that and how that is anything that relates to quality 
whatsoever.

2. Sure, of course.

3. I can't say automatic testing is a bad thing. Although I hate writing 
unit tests myself in the Java language due to strange structure they 
must have, I do agree to the joy of seeing them run and succeed ;-).

I have also used them successfully in simple Bash scripts where things 
can break so easily everything seems to be made of broken porcelain already.

This form of automation is amazing of course and it seems like a really 
great system. I did not know it was so extensive before.

Nevertheless it is clear from actual people's experiences that many 
things go wrong in software packages - maybe not between, but certainly 
inside of them. I mean, you just have to look at this list. I see 
reports of things not working in Chrome and in Firefox and as a current 
Windows user, I never have to deal with that. Your allusions are to the 
quality of the distribution and that's fine. But I consider Linux itself 
very broken because I simply know that the moment I install it again 
(and I might do it shortly just to be able to use Calligra :p) my life 
will become more painful. My current version of Windows is a pain but it 
is not nearly going to be as Painful as Linux. No matter what version I 
try or what distribution I use. I am going to be struggling with 
commands I don't know the syntax of. I am going to try to repeat tasks I 
have done before but since forgotten. I am going to be confronted with a 
great lack of menu-based interfaces on the command line -- the way "make 
menuconfig" was always excellent, but there is not much like it today. 
My favourite is the program IPTRAF, it just works. I use a firewall 
called Vuurmuur somewhere, it works okay too. I always focus on the user 
experience of something. The moment I write some script or program, I 
will go to great lengths to make it as easy for people as possible. And 
also for myself, of course. That may take a lot more time, but in the 
end that is really insignificant when you compare how much benefit it 
gives you.

I once came across some open source package where the maintainer or 
developer said: there is no documentation, /because this is free 
software/. /And because we are developing this in our free time, we have 
no time to write documentation/. That was just such outstanding nonsense 
it hurt the eyes to hear it ;-). There is no requirement whatseoever to 
force all your attention into development, and of course OpenSUSE has 
pretty good documentation I guess.

You can say all you want about how good your system is and come up with 
metrics to prove it. But my experience is still the same. I also think 
that OpenSUSE is more Robust than say Kubuntu or even Ubuntu, probably. 
OpenSUSE has a more robust feel to it from the get go, I mean that is a 
given fact to me. I am not saying that means everything is better, but 
as a small example:

- there are hardly any third-party Ubuntu repositories (launchpad) from 
3rd parties that are actually worth something
- there is a great number of usable and helpful 3rd party repositories 
for OpenSUSE that can allow you to upgrade something to a newer version.

Just a simple thing but clearly indicative of how it is.

Nevertheless we must differentiate between at least 3 things:

- a distribution is not the software itself mostly
- there are a million million things that can go wrong in any piece of 
software that perhaps *could* be tested as clearly and robustly as 
OpenSUSE is being tested, but experience just dictates that a lot of 
errors, bugs and difficulties make it through in Linux that are almost 
always absent from Windows.
- when I talked about robustness I meant mostly the distribution stuff 
you talked about, which clearly informs the developer of a great many 
things that can go wrong, but also equally I was talking about crashing 
software and all of that. Now, granted, a full snapshot rollback would 
normally indicate something wrong with the package system or some system 
configuration. Nevertheless, although I feel all of this feedback is 
great in diagnosing problems, they are no excuse or replacement for a 
full system understanding of what you are doing, and amazing design 
architecture or architecture design that warrants that you make software 
that is less bug-prone to begin with.

I was once watching a bit of the stuff on a mailinglist going on about 
the development of Plasma. Those messages that I saw, they were speaking 
of hella complex software. You know, you can usually directly perceive 
not just how complex something is, but also how complicated. I like that 
there are two words for it.

Complex means many components can be working together but it is still 
elegant.
Complicated means the opposite: it is a lack of elegance.

When something is utterly simple, it can also be utterly complex, 
because the building blocks are so well defined. The better your 
building blocks are, which means they are elegant, simplistic, 
well-designed, clear boundaries, clear concepts, then the systems you 
can build with it become increasingly big without losing that elegance.

However, when something is complicated, it means it requires effort to 
understand it. A real complex/simple system is effortless to understand. 
A complicated system always requires hard work. And those Plasma coders, 
what they showed me was complicated, not complex.

A bug fixed left introduced a bug fixed right, that sort of thing.

And I would be very happy if my system never broke due to your testing, 
I guess. But happiness is not the real good word. Happiness in my case 
results from being around what I like.

The system might not break, great, wonderful. But here is the third part:

- a system that is just robust due to automatic testing feedback, is not 
the same as a system that is robust because of great design principles.

The first is a mechanism and a process. The second is.... a choice. You 
can say the first is a choice as well, the choice to not break anything. 
But the latter is the choice to make something that /cannot/ break.

So what I'm saying here in response to your part 3 is that. Yes it is 
good. But. It seems to be about reactively fixing mistakes, rather than 
proactively design something good. It is basically bug chasing what you 
do. It is not leading. It is following. Debugging is always like that, 
but you prefer to write software that has fewer bugs so you won't need 
to work so hard before it compiles ;-) (for instance).

What I mean by that and I'm getting a bit tired. In relation to rollback 
systems. You're saying that it doesn't matter because your systems are 
robust /anyway/. It is like you have prevented yourself from ever making 
a mistake, because you'll be caught by your jarretels - I mean something 
else ;-) -- when you do.

What I myself was alluding to was a change in incentive. Perhaps you 
might say, incentive is irrelevant, because incentive stays the same: we 
are always committed to delivering unbroken systems and we have put in 
place the measures to ensure that.

Yet, this mechanic cannot replace what the heart does. Is all I say now.

And the reason of course is that robustness also extends into user 
friendliness, usabiity. Perhaps this is at odds because rollback should 
indeed refer to a distribution and its package system (for instance). 
And now I'm suddenly talking about the quality of individual packages.

And of course you can break stuff. But if the system is perfect, you 
won't need to roll back. And it just doesn't feel that way. This is nice 
in principle but. Reality is different.

4. Well you are right about that Or I assume you are. Just one 
experience with QA systems. When I was working in some factory they 
produced air filters for airco systems. My direct boss or supervisor 
wanted me to work really fast. But next door was the QA team and they 
had other ideas, and sometimes sent stuff back. It was like trying to 
serve two bosses at once, God and the Devil in that sense.

And I was torn, I wanted to do high quality work, but for my direct boss 
quality was only measured by the number of items they did not complain 
about.

If it passes, it is good, even if I didn't feel that way. More even I 
hated being hated on by the QA team. I did not consider my work being 
sent back a good way to ensure good products.

I wanted to make something good right from the get go, not as a result 
of shitty behaviour being returned to me.

So yes it slows it down. You are obviously and probably very right.

But it is the wrong way to go about things, for me. I don't create 
something shitty and then throw it at the door until it is not shitty 
anymore. I create a thing of beauty right away, and I don't depend on 
outside factors to judge my work.

If you are working with a reactive system like this, the end result is 
only as good as the system of tests you run, and absolutely no better 
than that.

That's not creative and it doesn't lead to better designs. It just leads 
to more fixes of poor designs, in that sense.

I like to live at the cause of my experience, not at the result of it.

At the start of it, not at the end. As the creator, not as the respondent.

5. Sure, but still not for the right reasons and still as a mechanical 
thing instead of something that requires or fosters a vision of what a 
good software product is going to be.

It is like they say natural selection is purely the result of natural 
mutations. I don't think that's true. I think in an important way, 
creation or evolution follows a plan. But regardless.

If you don't give direction to what you do out of a desire to create 
that perfect system in a sense, but you are content with just fixing the 
errors that pop up with doing half a man's job, you will never create 
something outstanding.

6. That's fine. But like I say, it is still with not enough of a mind as 
to what you are doing. Cause this thing. You can do this while drunk and 
partially asleep. I know how it works. It is a bliss in a way, but you 
just habitually run a test suite because you are too tired to think (for 
instance) so instead of really knowing what you're doing, it doesn't 
really matter because if you happen on the correct solution by 
happenstance, it is okay too.

And once it passes the test you are like, okay fine, I'm done.

And you may create stuff that works according to the tests. But the 
tests were created by someone with an idea of what the system needs to 
do. Those tests embody that person's mindset. So the system will do 
exactly what that person wants, and no more. So where is the creativity? 
Where is the innovation? And where is real software quality?

I just don't think this is an environment that really favours or fosters 
thinking about real quality. Just like with my boss, the job was to 
deliver as shoddy a work as possible while still passing the test, 
because THIS MAXIMIZED MY PRODUCTION. He didn't care about quality, that 
was not his job. He cared only about output.

So you say the development PACE is still very high and somewhat slowed 
down. Okay. But I was not talking about pace. Pace is quantity. I was 
talking about quality.

And I think and pretty much know for sure this thing is the same with 
OpenSUSE or any other distribution that does this well. The goal is to 
maximize throughput with as poor a product as possible that still passes 
the tests.

And that is WHAT I was saying: features (Y(f)) but not quality (Y(r)). 
The energy you spend on quality is the minimum that will pass the test, 
because M = Y(f) + Y(r), which means that if you minimize Y(r), that 
will maximize Y(f), and hence your throughput.
In the other email I said that you could also see benefit as a product 
of quantity and quality. What you do is find the lowest quality that 
will pass the tests, which means you can direct more effort at quantity 
and as such speed up the process of development.

Expediency, as we have here called it.

But I was alluding to two things:

- expediency is naturally favoured by an enjoyable work environment
- quality however has to compete with quantity, and rollback systems 
tend to favour quantity.

I know you mr. Brown Robert may feel as though you are on fire.

Your systems are in place, everything runs like a train. Oiled like a 
good machine. Let's keep going man!.

And this favours OUTPUT. This favours PACE and in that sense, yes, 
expediency if we have to use it in that way. It is a more complex term 
of course.

You set a certain quality level, use the minimum amount of energy to 
reach that, and then spend everything else maximizing the output given 
that quality level that you need.

This is just exactly what I was saying. Except that I was tying rollback 
into that.

I was tying the idea into that that quality could 'even be lower' 
considering a rollback solution that is very effective and fast and easy 
to use. You set a certain platform that you want to reach and you are 
content when you reach it. No matter if you expend maximum effort in 
maintaining that. It can also mean indirectly that the person designing 
the platforms, is also content with less in a certain way. After all, it 
can keep the pace going right. Why worry about details so much? We are 
on the GO! ;-).

You cannot install creativity in the process if reaching QA goals is 
your goal.

There will be no growth. No identification of issues. No dreams about 
betterment. No creative appraisal. If QA is the goal and the metric, 
then why do you need anything else right. No need to be critical etc., 
that's the job of QA, not you. Your job is just to produce.

7) System rollback with ANY distribution is, in my mind, essential
because there are thousands of ways a user or 3rd party software can
ruin a system.

That is rather Linux centred of course. And while it is essential, I 
have never had it!!!!. And I still don't want to have it really. It 
doesn't feel right to me. For an example. When I roll back my system to 
a previous state, and this is not really an example now but just what 
happens to me, I lose track of what my state is.

If I had made changes since the last snapshot, I don't remember them or 
remember to do them again, because in my mind they are already done.

With a backup it's not so bad, because I make it consciously. I really 
hate it when the state of some software product I'm working on becomes 
inconsistent in my mind. I don't know what's what anymore. It might take 
weeks going through the source before I know where I'm at. In the 
meantime, I don't feel happy. I feel uncertain about the product. It 
doesn't feel like my own anymore.

My most important project ended up in that state, partly due to Git I 
think, partly due to a mistake I made with an overlay filesystem. In 
Kubuntu you have aufs. It is a risky thing. I was writing changes and I 
didn't realize they were written to the overlay and not to the 
filesystem that the source was coming from. Stuff like that can really 
mess up the congruent state of something, at least in your mind.

That product is still in a bad state. I had lost text I had written 
which linked to a girl I liked, really stuff like that. Due to Git I 
think. I lost some work. I had done the work while sitting next to her. 
Talking about it.

Losing the text means losing the connection. Man. And now some work is 
lost that was important to me. And I can't redo it, because I did it 
with her next to me. You know, stuff like that.

Sure if you can pick a file out of a snapshot, you might be very happy. 
But then it's STILL messed up. I'm fine with backups. I'm not fine with 
snapshots. I just really really don't like it. It messes me up.

The example was that I reverted Windows to some previous state. I hadn't 
realized yet I could access files in the snapshot directly. The snapshot 
restore messed up some folders because Windows does a bad job at it. 
Then I had a manual job getting that fixed again. And after, I still 
don't know if everything is alright.

This uncertainty about the state of a system or project doesn't happen 
when I make a real conscious backup. It does happen with forms of 
snapshotting to me. Call it weird, call it strange. It has that effect 
on me.

A total system snapshot is too big to comprehend. Well, I don't know. 
After that SINGLE unimportant system restore, I seriously wanted to 
install windows again. It had fucked it up for me. I didn't enjoy it 
anymore.

I don't even like loading save games in a computer game these days. I 
prefer doing the level over from scratch. Feels better. Even if it takes 
longer. It feels better. I remember how e.g. the experience of Zelda 1 
on the NES got ruined when you played it in an emulator, because of the 
save states. Same thing. A save state is really the same thing.

Pretty much almost. Using save states, you can defeat any encounter, 
because.... BECAUSE if you mess up, you just go back 2 seconds, and try 
again!!!!. No longer any long spell of concentration required. Hence, no 
longer any concentration. Just rewind a million times if you have to, 
you only have to perform in 5 second chunks anyway.

But the old days. In the NES itself. You had to play flawlessly for like 
half an hour. That was the real stuff.
...
I'll expand on these individually
1) Look at openSUSE. We've had snapper/btrfs in the distribution for
years. We've had it as a default since 13.2. And yet the trend for
openSUSE releases has been to *extend* the release cycle, from the
previous 8 months to the current 12 months. Just because we have
snapper by default doesn't mean we speed up software development
2) I think can be accepted without too much argument. The longer
something is developed, the more testing, the more time and effort we
expend on polishing it. The balancing act here is ensuring that when
you actually release the final product is still up-to-date enough that
it is interesting and useful to the people who want to use it. This
was one of the key motivators for the direction Leap is taking (More
stable/unchanging as a general goal, and using an Enterprise codebase
to achieve that) while we have Tumbleweed as the counter balance
without a release schedule.
3) openQA really is magic. Every single build of Tumbleweed gets
tested with over 100 different scenarios before it is released.
Software RAID, encrypted lvm, dual booting with windows 8,
filesystems, KDE, GNOME, lvm with RAID 1, memtest, minimal X, split
/usr, textmode, uefi, uefi with secure boot, uefi with USB booting,
updating from 12.x and 13.x, network installs, live CD's, and more are
tested automatically, with image and log based automatic assessment of
the results. ie. When Tumbleweed ships, openQA knows that every screen
it cares about *looks* the way we want it to look for a user, and
every command it typed *acts* the way we want it to act. That is a
broader coverage ensuring the functionality of our Linux distributions
than most corporate manual QA departments can manage with several
weeks of human testing..and openQA does it every day..sometimes twice
a day..
4) Tumbleweed uses openQA as an integrated part of the software
development process. Even before any new package hits any
distribution, incoming submit requests are 'staged', the Build Service
makes 'what-if' DVD's that contain the changeset from the submit
request, and then openQA does brief testing to ensure the OS is still
valid. If it fails, the package isn't allowed anywhere near any of our
distribution repos. Then only when it is accepted, full system
validation kicks off with the breadth I described in 3).
5) For Tumbleweed, such a process as 4) is necessary in order for the
rolling release to be viable, but it's proven itself so effective at
ensuring quality it's also used not only by Leap but also by the SUSE
Linux Enterprise development teams. Even with a 'traditional'
release-schedule which provides time for manual QA, it's beneficial
having a constant picture of hundreds of different installation,
configuration, and production scenarios. openQA can keep track of that
picture for every single development build, not only Milestones, like
Alphas and Betas which undergo manual testing, but all the
intermediate builds that occur as things rapidly change in each
distributions OBS Projects.
6) So, yes, in one sense you're right. Tumbleweed moves fast, and only
relies on magical automated testing, and so system rollback is a
pretty good idea for Tumbleweed users incase something slips past the
magical automated quality testing robot that is openQA.. that said, as
an avid Tumbleweed user, I have to admit that in the last 2 years the
only time I've had to use snapper is when *I* have screwed up my
machine doing stuff that *I* should not have done..so I am more
dangerous than a rapid rolling development model..
7) End of the day, the discussion of quality is actually mostly
irrelevant when it comes to a discussion about snapper
- There are ~2500 packages in SUSE Linux Enterprise
- There are ~7500 packages in openSUSE Leap and Tumbleweed
- There are *tens of thousands* of packages in OBS. These have no
testing. These are not integrated with our distributions. Many of them
exist in order to be developed for future versions of SLE, Leap and
Tumbleweed. There should be no expectation of 'quality' at all. They
build, they are published, and people use them. That is risky, and yet
people do it every day.
- There are *millions* of other open source third party packages.
These also have no testing, they are not integrated with our
distributions. They might not even go through the most basic of checks
which OBS does as part of a build. Lots of software languages now have
their own package managers and repositories which effectively can
'sideload' software onto your machine, bypassing your system package
manager (npm, gems, etc). People don't care, want to do something, and
use them anyway. That is even more risky, and yet people do it every
day.
- Offerings like CoreOS/atomic/Containerisation all try to offer
solutions to this, but the reality is they are far far away from being
a comprehensive fix. Tools like Machinery
http://machinery-project.org/ can identify unpackaged files and
changes to files from packages, and speaking from experience, the
situation out there in the real world is a messy, ugly place full of
local hacks, forgotten changes, rogue software, and mess
- In addition to the thousands of OBS and 3rd party packages out there
doing god knows what to the machine, at the end of the day, users are
human. And humans are fallible. People screw up.
Even a 100% perfect quality Linux distribution can be easily ruined by
one 3rd party programme or one wrong command typed by one mistaken
user. Good backups are great in disasters, but mistakes happen every
day. You need system rollback regardless of how good the software
quality is. The real world demands it.
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org

Re: [opensuse] A contrarian view of file systems, snapshots and SSD

Xen