Re: [opensuse-factory] btrfs quotas enabled by default

3 Sep 2016

      On 9/1/16 11:57 AM, Chris Murphy wrote:
...
Hi,
Could someone urgently check a default installation of the current Leap beta?
btrfs qgroup show /
It seems at least one user has found in Tumbleweed that Btrfs quotas
are enabled by default, and experiencing bogus enospc as a result
(actually they're the result of quotas so they aren't entirely bogus).
I don't know how this happened, and I'm actually very curious how it
happened. But the priority is to find out if Leap has them enabled by
default and if so what's doing that so it can be reverted. It's bad to
do this in Tumbleweed, but it's a blocker bug for Leap. It can't ship
if quotas are enabled.
Just an update here, for those following along at home.  I responded to
this issue this morning on the btrfs list.  Since Chris hasn't relayed
that or updated his "qgroups are unstable, the sky is falling", posts I
couldn't let it go unaddressed.

Qgroups are enabled to allow snapper to make better decisions about
which snapshots to remove automatically.  To do this, we only need the
qgroups tracking data.  We do *nothing* with automatically adding or
enforcing limits using them.  In a later post, Chris claims that snapper
doesn't do this and in fact uses FIEMAP and INO_LOOKUP.  He's
incorrect[1,2].  The feature request for this is for SLES and not
public, but since there is a portion of the community that seems to
believe that we're just tossing this out there without any testing and
with kind of a "oh neat, a shiny toy" approach -- it might be helpful to
know that the feature was first requested 4 years ago and was rejected
for 5 SLE releases (service packs and GA releases) before we finally
enabled it for SLE12 SP2 (and, as a result, Leap and Tumbleweed.)  There
is always pressure to release a feature but if it's not ready, we don't
do it.  We still don't allow RAID5/6 or device replace in SLES because
we don't trust them, among several other features.  Stability is
something we take seriously.  That said, the system will function just
fine without qgroups enabled.  Snapper just won't be able to use that
information.

The issue the user ran into was not *at all* caused by qgroups.  The
only involvement was a harmless WARN_ON.  That needs to be fixed, but as
I said, it's harmless.  That it appeared at all was a side effect of the
root cause,  which is that his file system is throwing ENOSPC when it
shouldn't.  That's an issue that we'll start investigating on Tuesday
after we return from the long weekend in the US.  Without the root
cause, I can't yet say what other releases might show similar problems
or how widespread it might be.  Anecdotal reports in the btrfs list
thread seem to indicate that, even with a similar workload on the same
release, it's not necessarily a common, reproducible scenario.

What follows are links to my posts to the linux-btrfs lists with the
important bits included below.

The analysis of the original issue:
http://www.spinics.net/lists/linux-btrfs/msg58410.html

"""
Ok, so I think this is a race that can happen when one thread is
starting a transaction and another thread is committing a transaction
that involves creating a snapshot.

We reserve blocks at the top of start_transaction and that reservation
stays with the root.  In: btrfs_commit_transaction->
create_pending_snapshots-> create_pending_snapshot->
qgroup_account_snapshot-> commit_fs_roots, we clear that reservation
from the root via btrfs_qgroup_free_meta_all, potentially while
start_transaction is waiting to join a new transaction.  Or not.  It can
happen asynchronously, which is the point of having the reservation
prior to that.

So the thing is that this error can only occur if start_transaction
fails after this race occurs.  That, combined with your report that you
were seeing ENOSPC instead of EDQUOT, leads me to believe that this is
just a side effect of whatever is causing you to not hit ENOSPC.  I
expect that you'll see it again -- you just won't see the WARN_ON
anymore since quotas are disabled.  I suspect it's probably the
btrfs_block_rsv_add call immediately after the reservation, but there's
no way to tell without tracing.
"""

A followup from Ronan showed this to be the case -- the problem still
occurs, but he doesn't see the WARN_ON anymore.

And here's an outline of how we test qgroups and why we believe them to
be stable:

http://www.spinics.net/lists/linux-btrfs/msg58411.html

"""
We, like every other group of file system developers, run xfstests
pretty religiously.  Since qgroups are becoming a bigger part of the
btrfs experience for our products, we test them specifically.  Yes,
there are xfstests /just/ for qgroups, but we also make it a point to
run the entire xfstests suite with and without qgroups enabled.  Since
the requirement for snapper was to have accurate space tracking, that's
what we've focused on.

I obviously can't open up the SLES bugzilla to the world, so you're
going to have to take my word on this.  For our 4.4-based kernel there
are currently 3 qgroup related bugs.  The first is a report about how
annoying it is to see old qgroup items for removed subvolumes.  The
second is an accounting bug that is old and the developer just hasn't
gotten around to closing it yet.  The third is a real issue, where users
can hit the qgroup limit and are then stuck, similar to how it used to
be when you'd hit ENOSPC and couldn't remove files or subvolumes.  My
gut feeling is that it's the same kind of problem:  Removing files
involves allocating blocks to CoW the metadata and when you've hit your
quota limit, you can't allocate the blocks.  I expect the solution will
be similar to the ENOSPC issue except that rather than keeping a pool
around, we can just CoW knowing full well the intention is to release
space.  My team is working on that today and I expect a fix shortly.
"""

-Jeff

[1] https://github.com/openSUSE/snapper
[2]
https://github.com/openSUSE/snapper/commit/4d94edfd6189a6035011a85c55f2771e6...
    -- this is the one that sticks out the most, but it's not the only
qgroups-related commit

-- 
Jeff Mahoney
SUSE Labs