On Fri, Jan 22, 2016 at 3:33 PM, Christian Boltz <opensuse@cboltz.de> wrote:
Hello,
I just had an interesting[tm] problem - updating to the latest tumbleweed failed at random places.
It turned out that my btrfs / partition was full. Not with df -h (which reported some GB free), but "btrfs fi show" showed 100% usage. (Sorry for not including the exact output - I didn't save it.)
I'd say this shouldn't happen. But more information is needed to understand what's happening and give an explanation. There are two related issues that come up in these cases. 1. Near the time the volume becomes close to full, large files are being written. This means unallocated space is allocated as data chunks, which means they can't be used for metadata. One or more files are deleted, and then smaller files are written such as application or system updates, which are metadata heavy changes. But existing metadata chunks don't have enough space for these changes. Now the file system is considered full even though there's unused space in data chunks. 2. Btrfs is a copy-on-write file system which means even file deletion requires space to write the change, since there's no overwriting. So it's even possible to get into a rare but particularly annoying situation where it's not possible to delete files. The work around for this is to add a small device, delete files, balance with -dusage=15 (other values will work OK too, this is a suggestion that should go pretty fast but also free up a lot of space) then remove the device from the Btrfs volume. For space related problems, it's best to include in the post: df btrfs filesystem show btrfs filesystem df btrfs filesystem usage It is a bit tedious. The idea is that df should be reliable, lots of discussions have happened on the Btrfs list about it and the behavior has changed a few times based on those conversations. And btrfs fi usage is meant to be used to get more information than the (normal) df command. The 'fi df' and 'fi show' subcommands are mainly used now for troubleshooting and explaining behavior that don't meet expectations from df and 'fi usage'. So usually for most users 'fi usage' should be enough.
I moved 15 GB of libvirt images to a different partition and deleted some old snapshots, but both didn't help.
After some searching (and temporaryly breaking my /, which I could luckily repair with snapper rollback), I found out [1] that I should run btrfs balance start / -dlimit=3 which freed quite some space.
That's normal right now. The kernel code only deletes chunks when they become completely empty. Another run freed even more space, so I
decided to run it without the limit: btrfs balance start /
After that, I'm down to
# btrfs fi show Label: none uuid: 9f4a918d-fcd4-45d3-a1dc-7b887300eabc Total devices 1 FS bytes used 22.91GiB devid 1 size 50.00GiB used 25.31GiB path /dev/mapper/cboltz-root
Even if you re-add the 15 GB libvirt images in the calculation, this still means the rebalance freed about 10 GB = 20% of the partition size.
So the good news is that I could solve the problem. The bad news is that it happened at all.
I'd say the behavior you describe is becoming less common, is suboptimal when it happens, and Btrfs is being improved.
Should there be a cronjob that does the rebalancing or other btrfs maintenance regularly?
If something does this, it's really just masking the problem. I think the developers would prefer for there to be some minor problems like this and get user reports so they can try to fix and fine tune the behavior rather than having it masked. It's been a challenging problem to solve. So I can see why the maintenance script is provided but not enabled by default. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org