Re: [opensuse-factory] btrfs quotas enabled by default

6 Sep 2016

      On 9/2/16 11:36 PM, Chris Murphy wrote:
...
On Fri, Sep 2, 2016 at 8:17 PM, Jeff Mahoney <jeffm@suse.com> wrote:
...
Qgroups are enabled to allow snapper to make better decisions about
which snapshots to remove automatically.  To do this, we only need the
qgroups tracking data.  We do *nothing* with automatically adding or
enforcing limits using them.  In a later post, Chris claims that snapper
doesn't do this and in fact uses FIEMAP and INO_LOOKUP.
No, that was in response to Andrei about what 'btrfs filesystem du' is
using, not snapper.
The implicit question is why snapper can't use the same ioctl 'fi du'
is using, instead of qgroups.
Arvin provided the answer in a separate post.  What you're not getting
is that qgroups is the standard file system method to track extent usage
on a subvolume level.  In my opinion, this information should be tracked
on every file system since it's information *every* user wants.  We just
don't have good tools to provide it and, ideally, it should be provided
using standard tools like 'df.'  The biggest current use, for us, is for
snapper but it's not the only planned use.
...
...
He's
incorrect[1,2].  The feature request for this is for SLES and not
public, but since there is a portion of the community that seems to
believe that we're just tossing this out there without any testing and
with kind of a "oh neat, a shiny toy" approach -- it might be helpful to
know that the feature was first requested 4 years ago and was rejected
for 5 SLE releases (service packs and GA releases) before we finally
enabled it for SLE12 SP2 (and, as a result, Leap and Tumbleweed.)  There
is always pressure to release a feature but if it's not ready, we don't
do it.
What makes this challenging to accept with 100% confidence, is
snapper's long standing defaults instigating enospc problems. Yes,
enospc is a file system bug, not asnapper bug. But it unquestionably
was adding fuel to the fire in the form of an *extremely* aggressive
snapshot creation and retention behavior in the form of hundreds of
snapshots. But instead of being recognized as needing toning down, a.)
the user was passively blamed by telling them they can just change
snapper's configuration, b.) not acknowledging the total lack of use
case for so many OS states being retained and for so long; and now c.)
enabling quotas silently, just makes it seems like doubling down and
papering over previous excesses instead of a simple mea culpa.
You're making a lot of assumptions here which are personal opinions and
treating them as concrete fact.  The way snapper handles snapshot
creation may not be what *you* want to see, but it serves a purpose.
The thing is that on a per-snapshot basis, having lots of tightly
coupled snapshots doesn't use a ton of disk space.  You could have a
thousand snapshots taken a second apart and it wouldn't use much more
disk space than having two taken a thousand seconds apart unless there
was a huge churn in the middle that wasn't reflected in the final state.
 That's the main benefit of btrfs: snapshots are cheap, both in time
spent creating them and in disk space used to store them.  It's the
divergence between snapshots that takes up the disk space, and so you
could have a single snapshot taken 6 months ago and you may have more or
less the same amount of space pinned on the file system.  So, no, I view
your blaming snapper's snapshot policy as a bit of a red herring.  It
entirely depends on the workload.  And your typical root file system
doesn't have a very active write/delete workload outside of software
updates and /tmp (which we don't snapshot).  I'm willing to admit that a
Tumbleweed system running a nightly automated 'zypper dup' would be
pretty much the pathological case for this, though.  That's why it's a
default policy and not a hard coded policy.

So, moving beyond your assertion that snapper is "extremely aggressive"
and on to "not acknowledging the total lack of use case for so many OS
states being retained."  We have as a primary feature for SLE12, which I
believe is also available on Tumbleweed, the ability to select any
snapshot and boot from it with the option of rolling your entire system
back to that state, with logs, databases, etc excepted.  This might not
be a feature *YOU* personally want, but it is one that our customers
want and use regularly.  Taking snapshots as part of system change
transactions is a pretty obvious place to do that.  Similarly,
time-based snapshots are pretty standard as well.

And lastly, your assertion that we're "doubling down" and "papering
over" mistakes rings pretty hollow in the context I've outlined above.
...
Further making this challenging to accept at face value is the most
recent brfs-progs 4.7.1 available upstream has very conservative
language when it comes to quotas:
PERFORMANCE IMPLICATIONS
       When the quotas are turned on, they affect all extent
processing, taking a performance hit. It is not recommended to turn on
qgroups unless the
       user intends to actually use them
   STABILITY STATUS
       The qgroup implementation has turned out to be quite difficult
as it affects the core of the filesystem operation. The users have hit
various
       corner cases over time, eg. wrong accounting or system
instability. The situation is gradually improving but currently (4.7)
there are still
       issues found and fixed.
I don't see how these notes get recognized if (open)suse is using
quota code substantially similar to upstream. Only if your code base
is substantially different can you convincingly say you have no
performance or stability concerns.
We have some concerns but not in the vein of stability.  Any quota
system has performance overhead.  It's the nature of tracking more
information taking more time/space than not tracking it.  In the qgroup
case, there does exist a significant performance issue when the number
of backreferences to a single extent grows too large -- they're tracked
as a list and there's an O(n^2) algorithm in the middle of it.
Fortunately, that isn't a case many users hit on live systems and I'm
working on a fix for it.  So, maybe I haven't convinced you, but I've
stated our testing regimen and our belief that it's stable.  Ultimately,
I'm the one who has to respond to bug reports against them and deal with
any fallout, and I sleep pretty well.
...
Therefore it brings me right back to why snapper can't do things
differently?  a.) tone down the frequency of snapshots, and/or the
retention time/quantity; b.) use the ioctl's being used by btrfs fi du
instead of enabling quotas? Why the hard dependency on enabling
quotas? It really strikes me as snapper being fundamentally flawed
that it can't clean up after itself any better than it does unless
quotas are enabled for accounting.
Again, Arvin explained the reasoning for using qgroups.  What you seem
to be missing is that this *is* snapper cleaning up after itself using
information the file system can provide efficiently.  Unused disk space
is wasted disk space, and if you say you want your file system to have
30% head space above any snapshots, we need the allocation information
to know how to honor that guarantee.  Removing snapshots until we were
under 30% is pretty frustrating for the user when you can't tell them
ahead of time which snapshots are getting tossed.
...
...
The issue the user ran into was not *at all* caused by qgroups.
I'll take your word for it.
However, there are other opinions on the issue of qgroups aside from
the one user with a mysterious enospc problem.
http://www.spinics.net/lists/linux-btrfs/msg58385.html
Yep, that's an opinion alright.  Well spotted.

-Jeff

-- 
Jeff Mahoney
SUSE Labs