On 9/2/16 11:36 PM, Chris Murphy wrote:
On Fri, Sep 2, 2016 at 8:17 PM, Jeff Mahoney <jeffm@suse.com> wrote:
Qgroups are enabled to allow snapper to make better decisions about which snapshots to remove automatically. To do this, we only need the qgroups tracking data. We do *nothing* with automatically adding or enforcing limits using them. In a later post, Chris claims that snapper doesn't do this and in fact uses FIEMAP and INO_LOOKUP.
No, that was in response to Andrei about what 'btrfs filesystem du' is using, not snapper.
The implicit question is why snapper can't use the same ioctl 'fi du' is using, instead of qgroups.
Arvin provided the answer in a separate post. What you're not getting is that qgroups is the standard file system method to track extent usage on a subvolume level. In my opinion, this information should be tracked on every file system since it's information *every* user wants. We just don't have good tools to provide it and, ideally, it should be provided using standard tools like 'df.' The biggest current use, for us, is for snapper but it's not the only planned use.
He's incorrect[1,2]. The feature request for this is for SLES and not public, but since there is a portion of the community that seems to believe that we're just tossing this out there without any testing and with kind of a "oh neat, a shiny toy" approach -- it might be helpful to know that the feature was first requested 4 years ago and was rejected for 5 SLE releases (service packs and GA releases) before we finally enabled it for SLE12 SP2 (and, as a result, Leap and Tumbleweed.) There is always pressure to release a feature but if it's not ready, we don't do it.
What makes this challenging to accept with 100% confidence, is snapper's long standing defaults instigating enospc problems. Yes, enospc is a file system bug, not asnapper bug. But it unquestionably was adding fuel to the fire in the form of an *extremely* aggressive snapshot creation and retention behavior in the form of hundreds of snapshots. But instead of being recognized as needing toning down, a.) the user was passively blamed by telling them they can just change snapper's configuration, b.) not acknowledging the total lack of use case for so many OS states being retained and for so long; and now c.) enabling quotas silently, just makes it seems like doubling down and papering over previous excesses instead of a simple mea culpa.
You're making a lot of assumptions here which are personal opinions and treating them as concrete fact. The way snapper handles snapshot creation may not be what *you* want to see, but it serves a purpose. The thing is that on a per-snapshot basis, having lots of tightly coupled snapshots doesn't use a ton of disk space. You could have a thousand snapshots taken a second apart and it wouldn't use much more disk space than having two taken a thousand seconds apart unless there was a huge churn in the middle that wasn't reflected in the final state. That's the main benefit of btrfs: snapshots are cheap, both in time spent creating them and in disk space used to store them. It's the divergence between snapshots that takes up the disk space, and so you could have a single snapshot taken 6 months ago and you may have more or less the same amount of space pinned on the file system. So, no, I view your blaming snapper's snapshot policy as a bit of a red herring. It entirely depends on the workload. And your typical root file system doesn't have a very active write/delete workload outside of software updates and /tmp (which we don't snapshot). I'm willing to admit that a Tumbleweed system running a nightly automated 'zypper dup' would be pretty much the pathological case for this, though. That's why it's a default policy and not a hard coded policy. So, moving beyond your assertion that snapper is "extremely aggressive" and on to "not acknowledging the total lack of use case for so many OS states being retained." We have as a primary feature for SLE12, which I believe is also available on Tumbleweed, the ability to select any snapshot and boot from it with the option of rolling your entire system back to that state, with logs, databases, etc excepted. This might not be a feature *YOU* personally want, but it is one that our customers want and use regularly. Taking snapshots as part of system change transactions is a pretty obvious place to do that. Similarly, time-based snapshots are pretty standard as well. And lastly, your assertion that we're "doubling down" and "papering over" mistakes rings pretty hollow in the context I've outlined above.
Further making this challenging to accept at face value is the most recent brfs-progs 4.7.1 available upstream has very conservative language when it comes to quotas:
PERFORMANCE IMPLICATIONS When the quotas are turned on, they affect all extent processing, taking a performance hit. It is not recommended to turn on qgroups unless the user intends to actually use them STABILITY STATUS The qgroup implementation has turned out to be quite difficult as it affects the core of the filesystem operation. The users have hit various corner cases over time, eg. wrong accounting or system instability. The situation is gradually improving but currently (4.7) there are still issues found and fixed.
I don't see how these notes get recognized if (open)suse is using quota code substantially similar to upstream. Only if your code base is substantially different can you convincingly say you have no performance or stability concerns.
We have some concerns but not in the vein of stability. Any quota system has performance overhead. It's the nature of tracking more information taking more time/space than not tracking it. In the qgroup case, there does exist a significant performance issue when the number of backreferences to a single extent grows too large -- they're tracked as a list and there's an O(n^2) algorithm in the middle of it. Fortunately, that isn't a case many users hit on live systems and I'm working on a fix for it. So, maybe I haven't convinced you, but I've stated our testing regimen and our belief that it's stable. Ultimately, I'm the one who has to respond to bug reports against them and deal with any fallout, and I sleep pretty well.
Therefore it brings me right back to why snapper can't do things differently? a.) tone down the frequency of snapshots, and/or the retention time/quantity; b.) use the ioctl's being used by btrfs fi du instead of enabling quotas? Why the hard dependency on enabling quotas? It really strikes me as snapper being fundamentally flawed that it can't clean up after itself any better than it does unless quotas are enabled for accounting.
Again, Arvin explained the reasoning for using qgroups. What you seem to be missing is that this *is* snapper cleaning up after itself using information the file system can provide efficiently. Unused disk space is wasted disk space, and if you say you want your file system to have 30% head space above any snapshots, we need the allocation information to know how to honor that guarantee. Removing snapshots until we were under 30% is pretty frustrating for the user when you can't tell them ahead of time which snapshots are getting tossed.
The issue the user ran into was not *at all* caused by qgroups.
I'll take your word for it.
However, there are other opinions on the issue of qgroups aside from the one user with a mysterious enospc problem. http://www.spinics.net/lists/linux-btrfs/msg58385.html
Yep, that's an opinion alright. Well spotted. -Jeff -- Jeff Mahoney SUSE Labs