[Bug 1017461] New: btrfs balance renders system unresponsive and eventually even kills WiFi
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461 Bug ID: 1017461 Summary: btrfs balance renders system unresponsive and eventually even kills WiFi Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.2 Hardware: Other OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: suse@bugs.jan.ritzerfeld.org QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- I installed openSUSE Leap 42.2 with btrfs as root. Now, performing a btrfs balance or a snapper cleanup takes "ages" while there is no or little disk activity but btrfs or btrfs-transaction constantly hogs one CPU. There is plenty of unallocated space (28 out of 40 GiB). The system becomes very unresponsive and even loses its WiFi connection until next reboot. Thus, btrfsmaintenance will nearly kill my system every week! After disabling btrfs quota everything works fine! Thus, enabling the experimental btrfs quota feature for snapper was a really, really bad idea. IMHO this is critical bug, if it happens to other user. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c2
Friedhelm Stappert
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c3
--- Comment #3 from Richard Weinberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Peter B
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c4
Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c5
--- Comment #5 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c6
--- Comment #6 from Richard Weinberger
Hi guys!
Yes, this is a **very** serious problem. I have already reported that my system is unresponsive every time btrfs maintenance starts, and I am using Tumbleweed. I posted to the mailing list, but received just one answer:
https://lists.opensuse.org/opensuse-factory/2016-09/msg00130.html
Indeed, when I **disabled** quotas here, then the freeze stopped. Thanks for that workaround! Actually, a btrfs developer (Chris Murphy) has already warned that the quota feature is not stable in btrfs and must not be used by default on production systems:
https://lists.opensuse.org/opensuse-factory/2016-09/msg00032.html
However, some openSUSE developers contradicted Chris, specially Richard Brown:
https://lists.opensuse.org/opensuse-factory/2016-09/msg00085.html
Hence, nobody took the advice and quotas were enabled by default in Leap 42.2.
Maybe now with this bug, which I can confirm that is happening in **all** my machines with quotas enabled (HP Workstation, Dell laptop, and a Macbook), this problem can be revisited. Furthermore, disabling quota fixes it also in all my machines.
Hmmm, I fear quotas are enabled because of snapper(8). It seems to use them for some clean-up policies. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Jan Ritzerfeld
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c7
--- Comment #7 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Tomáš Chvátal
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Ludwig Nussel
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Ludwig Nussel
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Libor Pechacek
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c9
--- Comment #9 from Goldwyn Rodrigues
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c10
--- Comment #10 from Richard Weinberger
I have been trying to recreate this issue (especially the trace in comment #3) but have not succeeded so far.
Well, I don't expect this to be reproducible within a few minutes. It happened here in my build server after an uptime of more than two weeks.
Richard: Does btrfs check report your filesystem is healthy?
The check upon boot reports it as healthy. Since it is my rootfs I cannot run the check directly.
Ronan: Are you getting these backtraces in the kernel log as well?
btrfs balance is a relatively I/O intensive operation because it has to move around chunks. However, if the tree is balanced frequently, then each balance should not take as much time.
The reporter here seem to observe the opposite. ;-) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c11
--- Comment #11 from Jan Ritzerfeld
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c12
--- Comment #12 from Jeff Mahoney
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c13
--- Comment #13 from Jan Ritzerfeld
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c14
--- Comment #14 from Jeff Mahoney
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c15
--- Comment #15 from Richard Weinberger
How many subvolumes does the affected file system have?
As with Jan, 42.2 default installation. The only difference is that I'm using snapper on / and /home subvolumes. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c16
--- Comment #16 from Jeff Mahoney
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c17
--- Comment #17 from Richard Weinberger
Sorry, I should've been more clear: "Subvolumes" in this context includes all snapshots.
In my case: spankyham:~ # btrfs subvolume list -a / | wc -l 87 If you need more infos, just ask. :-) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c18
--- Comment #18 from Jan Ritzerfeld
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c19
--- Comment #19 from Jeff Mahoney
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c20
nicholas cunliffe
It happened here in my build server after an uptime of more than two weeks.
i seem to remember reading that build directories (along with VMs DBs) are one of the situations in which disabling COW/snapshots is advisabe? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c21
--- Comment #21 from Richard Weinberger
It happened here in my build server after an uptime of more than two weeks.
i seem to remember reading that build directories (along with VMs DBs) are one of the situations in which disabling COW/snapshots is advisabe?
Let's wait what the btrfs developers say, there a lot of hearsay available on this topic. I expect btrfs to work with any workload, sure disabling COW could bring more performance but it shouldn't be mandatory for every non-trivial load. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c22
--- Comment #22 from Friedhelm Stappert
It happened here in my build server after an uptime of more than two weeks.
FYI, to me, it happens about once a day (e.g. right now). Maybe snapper is cleaning up old snapshots (as mentioned in comment #6). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c23
--- Comment #23 from Goldwyn Rodrigues
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c24
--- Comment #24 from Richard Weinberger
This is primarily caused with the patches for qgroup accounting (btrfs: qgroup: Fix qgroup data leaking by using subtree tracing) correction which calls btrfs_qgroup_trace_subtree() twice, one for the src tree and once for dest tree. This function is CPU intensive which causes the system to stall. We would have to investigate other ways to perform this correctly.
What do you suggest as workaround until the root cause is fixed? Can I disable quotas? I'm not sure whether this will harm snapper. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c25
--- Comment #25 from Goldwyn Rodrigues
What do you suggest as workaround until the root cause is fixed? Can I disable quotas? I'm not sure whether this will harm snapper.
If you don't have a need for quotas, I'd suggest to disable quotas until we find a working solution to fix this. Thanks for understanding. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c26
--- Comment #26 from Richard Weinberger
(In reply to Richard Weinberger from comment #24)
What do you suggest as workaround until the root cause is fixed? Can I disable quotas? I'm not sure whether this will harm snapper.
If you don't have a need for quotas, I'd suggest to disable quotas until we find a working solution to fix this. Thanks for understanding.
This was not my question. The question was whether it will harm snapper. Both snapper and quotas are enabled by default on 42.2, _I_ don't need quotas, but my fear is that some openSUSE component (i.e. snapper) will fail badly when I disable quotas. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c27
--- Comment #27 from Ronan Chagas
This was not my question. The question was whether it will harm snapper. Both snapper and quotas are enabled by default on 42.2, _I_ don't need quotas, but my fear is that some openSUSE component (i.e. snapper) will fail badly when I disable quotas.
Hi Richard, I am using a Leap 42.2 without quotas for a very long time. It was a 42.1 that was updated. I have never seen any problems at all related with snapper. IIRC, the only feature you will miss in snapper will be the ability to auto clean snapshot. Please, someone correct me if I am wrong.(In reply to Goldwyn Rodrigues from comment #9)
Ronan: Are you getting these backtraces in the kernel log as well?
Hi Goldwyn, sorry I was kind of offline last couple of days. Yes, I am seeing those backtraces in kernel log when quotas are enabled. After disabling it, they seem to be gone. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c28
--- Comment #28 from Goldwyn Rodrigues
This was not my question. The question was whether it will harm snapper. Both snapper and quotas are enabled by default on 42.2, _I_ don't need quotas, but my fear is that some openSUSE component (i.e. snapper) will fail badly when I disable quotas.
No, I don't think it will affect snapper or any other component. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c29
--- Comment #29 from Jan Ritzerfeld
[...] Can I disable quotas? I'm not sure whether this will harm snapper.
It actually will if you used quotas before: # snapper cleanup number quota not working (preparing quota failed) # snapper get-config | grep QGROUP QGROUP | 1/0 This fixes it: # snapper set-config QGROUP= However, I do not know to renable it! Maybe you need the original value of QGROUP. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c30
--- Comment #30 from Richard Weinberger
[...] Can I disable quotas? I'm not sure whether this will harm snapper.
It actually will if you used quotas before: # snapper cleanup number quota not working (preparing quota failed)
# snapper get-config | grep QGROUP QGROUP | 1/0
This fixes it: # snapper set-config QGROUP=
However, I do not know to renable it! Maybe you need the original value of QGROUP.
Yeah, same here. I didn't enable quotas in snapper, this seems to be a default setting... Well done.</sarcasm> -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c31
--- Comment #31 from Jan Ritzerfeld
[...] This fixes it: # snapper set-config QGROUP=
Well, no. It only worked here because snapper seems to cache some of its config, changes directly in the config file will take some time to apply. So, man snapper is correct and the LIMIT variables must not have ranges without quotas: # snapper set-config QGROUP= NUMBER_LIMIT=10 NUMBER_LIMIT_IMPORTANT=10 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c32
--- Comment #32 from Jeff Mahoney
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Libor Pechacek
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c35
--- Comment #35 from Richard Weinberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c36
t neo
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c37
Frederic Crozat
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c38
--- Comment #38 from t neo
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c39
Eric Schirra
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c40
--- Comment #40 from Ronan Chagas
I have same problem with freeze With and without QGROUP= Normaly Sundays (accident?)
In top i see btrfs balance with 100% or btrfs transacti with 100% This pass to interrupt any input.
When this happend when screensaver is on no login is possible.
the problems will take as long as btrfs runs. circa 1h.
Just to confirm, did you disable the quotas in BTRFS? You can check this by running the command: btrfs qgroup show / All my problems related to this bug have gone after I disabled quotas. As I pointed out in my comment #4, btrfs devs warned sometime ago that quota is an unstable feature and we should avoid using it. However, it seems that you will lost a YaST feature if you disable quotas (something related to auto-clean snapshots IIRC).
I think this is not only high. This is a critical bug.
I totally agree. If this bug is so hard to fix and depends on upstream, we should really start to think about disable quotas by default in Leap at least. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c41
--- Comment #41 from Eric Schirra
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c42
--- Comment #42 from Eric Schirra
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c43
--- Comment #43 from Eric Schirra
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Christopher Brodt
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c44
--- Comment #44 from Christopher Brodt
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c45
--- Comment #45 from Ronan Chagas
I've had a similar problem since installing Tumbleweed in November. However, whenever I run `sudo btrfs quota disable /` my system becomes unresponsive and I force a reboot after 10 or 15 minutes. What does that command do exactly? Does it just need time to run?
Hi Christopher, This command was executed here in seconds. Are you sure that no other btrfs maintenance command is being executed when you are trying to disable quotas? Furthermore, how many snapshots do you have? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c46
--- Comment #46 from Christopher Brodt
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c47
--- Comment #47 from Ronan Chagas
I've got 22 snapshots. I'm not aware of any other maintenance commands running, but I did notice this when viewing the qgroups:
cbrodt@cbrodt-traitify2 ~: sudo btrfs qgroup show / WARNING: rescan is running, qgroup data may be incorrect
That message is always there, so maybe that's blocking it?
Can you please post the output of `btrfs quota rescan -s /`? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c48
--- Comment #48 from Christopher Brodt
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c56
Santiago Castro
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Thomas Rother
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Michal Nowak
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Oliver Kurz
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Ulrich Hobelmann
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c57
Antoine Saroufim
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c61
--- Comment #61 from Libor Pechacek
So, this issue should now be fixed by the following upstream commit?
It's only part of the fix. The soft lockups are prevented by d8422ba334f (btrfs: backref: Fix soft lockup in __merge_refs function). (In reply to Jan Ritzerfeld from comment #60)
I updated the kernel and re-enabled quotas (not that easy).
Issuing `snapper setup-quota' not easy?
And even meta data balancing still 1. takes 15 minutes while completely hogging 1 CPU using a laptop on battery (recipe for a disaster), and 2. frequently delays starting shell commands, causes severe WiFi packet loss, and locks up the system for several seconds.
Same here with Tumbleweed. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c62
--- Comment #62 from Jan Ritzerfeld
(In reply to Richard Weinberger from comment #59)
So, this issue should now be fixed by the following upstream commit?
It's only part of the fix. The soft lockups are prevented by d8422ba334f (btrfs: backref: Fix soft lockup in __merge_refs function).
Hmm, is this commit included in openSUSE-SU-2017:0907-1?
(In reply to Jan Ritzerfeld from comment #60)
I updated the kernel and re-enabled quotas (not that easy).
Issuing `snapper setup-quota' not easy? [...]
Sure, but that doesn't work because a "snapper cleanup number" then says "quota not working (preparing quota failed)". I had to manually assign the correct qgroup to the snapshot subvolumes already taken without an qgroup. snapper only did this automatically for the first snapshot without an qgroup. Took me an hour to figure that out... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c63
--- Comment #63 from Libor Pechacek
Hmm, is this commit included in openSUSE-SU-2017:0907-1?
AFAICT yes: http://kernel.suse.com/cgit/kernel/log/?h=rpm-4.4.57-18.3&ofs=50 Also feel free to inspect the package change log (rpm -q -changelog kernel-default-4.4.57-18.3.1), which should contain a record named "btrfs: backref: Fix soft lockup in __merge_refs function" and a reference to this Bugzilla.
Sure, but that doesn't work because a "snapper cleanup number" then says "quota not working (preparing quota failed)".
I see. I didn't know about these dark corners. Is that perhaps something for a bug report? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c64
--- Comment #64 from Jan Ritzerfeld
[...] AFAICT yes: http://kernel.suse.com/cgit/kernel/log/?h=rpm-4.4.57-18.3&ofs=50
Many thanks for your help!
Also feel free to inspect the package change log (rpm -q -changelog kernel-default-4.4.57-18.3.1), which should contain a record named "btrfs: backref: Fix soft lockup in __merge_refs function" and a reference to this Bugzilla.
That's what I thought I did. And yes, it is included. I didn't find it because the changelog is not ordered by date. First entry date is 2017-02-19 and last 2009-03-04. However, the record you mentioned is dated 2017-03-27 and found in line 36776?!
Sure, but that doesn't work because a "snapper cleanup number" then says "quota not working (preparing quota failed)".
I see. I didn't know about these dark corners.
Me too! I already noticed that I was not able to re-enable them in Comment #29. :)
Is that perhaps something for a bug report?
Maybe https://github.com/openSUSE/snapper/issues/257? Because of this issue, at least the exception message "preparing quota failed" was added in https://github.com/openSUSE/snapper/issues/259. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c65
--- Comment #65 from Richard Weinberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c66
--- Comment #66 from Richard Weinberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c67
--- Comment #67 from Richard Weinberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c68
--- Comment #68 from Richard Weinberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c69
--- Comment #69 from Edmund Nadolski
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c70
--- Comment #70 from Richard Weinberger
As mentioned, fb235dc06 is not expected to be a complete fix. However looking at the stacks you may be encountering a regression. Could you run with fb235dc06 reverted?
Sure. Will take 2-3 days. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c71
--- Comment #71 from Richard Weinberger
(In reply to Edmund Nadolski from comment #69)
As mentioned, fb235dc06 is not expected to be a complete fix. However looking at the stacks you may be encountering a regression. Could you run with fb235dc06 reverted?
Sure. Will take 2-3 days.
With that commit reverted I don't see the lockup anymore, although, as expected, btrfs-balance still consumes a lot of cpu. The system has an uptime of 36h and a typical work load. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Edmund Nadolski
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c73
--- Comment #73 from Edmund Nadolski
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c77
Koen De Jaeger
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c80
Sven Heithecker
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Dmitry Roshchin
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c84
Gerald Weber
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c85
--- Comment #85 from Ronan Chagas
I am experiencing a similar problem on a freshly installed Leap 42.3.
The btrfs-transacti process makes the system completely unresponsive for about 10 to 15 min. It happened already 3 times since the install 3 days ago, that is, typically once a day. I am correlating this with the automatic software update which apparently triggers snapper into action and then btrfs. I have changed the software check to happen only once a month to see if it eases the problem, but I would welcome any other workaround as this is being very disruptive.
The machine is a Dell Inspiron Inspiron 5448 and has as disk a Samsung SSD 850 EVO 1TB.
I am happy to provide more system info or do some tests if it is of any help.
Hi Gerald, They only workaround I know so far is to disable quotas in btrfs. I don't know if it is acceptable to you, but in all my Tumbleweed machines the problem went away after this. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c87
Harald Achitz
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c88
Oliver Kurz
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c97
Andre Guenther
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c98
--- Comment #98 from Edmund Nadolski
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c99
Oliver Kurz
The patches listed in comment #94 have been merged into upstream 4.14. Otherwise one of the kernels mentioned in the previous comment has them.
It should be safe to disable quotas as far as btrfs itself is concerned.
I would not recommend disabling quotas (in case you mean btrfs qgroups) as IIUC they are implicitly used to prevent snapshots filling up the hard disk by cleaning them up if they reach *their* quota. To me it seems the issue is not really resolved even though I think the patches provided in the kernel by enadolski@suse.com should help. I guess one has to look at a more whole system level. Would it make sense to lower the I/O prio of background jobs? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c100
--- Comment #100 from Harald Achitz
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c101
Edmund Nadolski
To me it seems the issue is not really resolved even though I think the patches provided in the kernel by enadolski@suse.com should help. I guess one has to look at a more whole system level.
I am restoring the previous status as I am not clear of the justification to re-open -- considering that the indicated patches evidently were not even run, it is not shown that a problem still exists. These patches have demonstrated a 50% improvement in btrfs backref performance, so if further symptoms are observed there may well be other causes (not necessarily even in the fs - as you mention the whole system would need to be looked at). In that case the best way forward is to please open a new BZ including all relevant info so that it can be properly investigated (and without potential obfuscation from the previous issue). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c102
--- Comment #102 from Jeff Mahoney
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
Oliver Kurz
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c103
--- Comment #103 from Oliver Kurz
[…] I am restoring the previous status as I am not clear of the justification to re-open -- considering that the indicated patches evidently were not even run, it is not shown that a problem still exists.
These patches have demonstrated a 50% improvement in btrfs backref performance, so if further symptoms are observed there may well be other causes (not necessarily even in the fs - as you mention the whole system would need to be looked at). In that case the best way forward is to please open a new BZ including all relevant info so that it can be properly investigated (and without potential obfuscation from the previous issue).
Errr, I am not sure what your intention is. I am pretty sure that I run a kernel with the patches you mentioned checking with `rpm -q --changelog kernel-default`. As I stated I think your contributions improved the situation. Ok, I don't want to annoy you so I created another bug for the "btrfs maintenance scripts review": https://bugzilla.opensuse.org/show_bug.cgi?id=1063638 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c104
--- Comment #104 from Flex Liu
I have disabled btrfs quotas on the x121e which I have not reinstalled and this system works since than without issues. So it seems that btrfs quotas, for system you boot only from time to time, is a serious problem. But now the question is, how do I clean the snapshots by hand? or turn of the snapshots, I do not need this on this machine, I mean, this is a notebook I mostly use to listen music from or connect to a hdmi tv display to watch something, it has different requirements than some server or production workstation. btrfs with all these features is obviously not the most optial default for such a system
Snapper is a management tools in openSUSE, it will help you to remove the snapshots. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c105
--- Comment #105 from Flex Liu
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c114
Oliver Schmidt
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461
http://bugzilla.opensuse.org/show_bug.cgi?id=1017461#c115
--- Comment #115 from Oliver Kurz
participants (1)
-
bugzilla_noreply@novell.com