On Sun, Jan 24, 2016 at 2:40 PM, Uzair Shamim <uzashamim@gmail.com> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/24/2016 03:22 PM, Chris Murphy wrote:
On Sat, Jan 23, 2016 at 1:35 AM, Thomas Langkamp <thomas.lassdiesonnerein@gmx.de> wrote:
df showed enough free space, and I did not know about "btrfs fi show".
Ideally you shouldn't have to. But Btrfs still has some rough edges, and really it's quite different and the regular df command doesn't have the granularity to distinguish between types of free space that Btrfs can have: space only for metadata, only for data, and for either.
To understand, I read https://btrfs.wiki.kernel.org/index.php/Balance_Filters and the FAQ. However I do not understand much. What is all those "metadata" and what exactly does balance filter? I do understand what btrfs balance tries to fix (full file system errors), but not how or why the problem exists.
Can someone explain in less technical terms?
Basically there are two kinds of allocations in Btrfs: extents and chunks. A chunk is kinda like an uberextent or block group. Chunks are contiguously allocated from free space. Metadata (the file system itself) only gets written to metadata chunks, data (contents of files) only get written to data chunks.
As someone who has always been curious about btrfs, thank you for this post it was very informative! This actually makes sense to me :)
There is a "mixed block group" which is a chunk used for both metadata and data. It's available only at mkfs time with the -M flag. This is automatic for 1GiB and smaller file systems, it's recommended for 5GiB and smaller. I'm not sure why there aren't patches yet to make that automatic too. There's some discussion that maybe it's advantageous for anything about the size of a USB stick (16GiB) may be better suited using -M mixed block group. It tends to be smaller file systems that end up in situations where there's space in data chunks but no space left for metadata chunks, and mixed block group avoids this problem. The negative of always doing this is it's less efficient. When mixed, the nodesize must be the same as the sector size (this is a logical sector in Btrfs terms, not a physical sector, and right now it's fixed to pagesize so on x86 that's 4096 bytes). That means less liklihood small files can be stored inline with their metadata. Example: item 75 key (4450 INODE_ITEM 0) itemoff 6425 itemsize 160 inode generation 189 transid 189 size 3964 nbytes 3964 block group 0 mode 100700 links 1 uid 1000 gid 1000 rdev 0 flags 0x0 item 76 key (4450 INODE_REF 907) itemoff 6402 itemsize 23 inode ref index 22 namelen 13 name: uefi-grub.cfg item 77 key (4450 XATTR_ITEM 3817753667) itemoff 6319 itemsize 83 location key (0 UNKNOWN.0 0) type XATTR namelen 16 datalen 37 name: security.selinux data unconfined_u:object_r:unlabeled_t:s0 item 78 key (4450 EXTENT_DATA 0) itemoff 2334 itemsize 3985 inline extent data size 3964 ram 3964 compress 0 This 16KiB node actually contains references for 79 items, four of which are for a small file (4KiB) whose data is stored inline in this node along with its metadata (you can see the selinux label context and filename). Had the nodesize been 4KiB, this file's data extent would be elsewhere rather than inline.
While /var/lib/libvirt/images is on btrfs, its put in a subvolume by default. Doesn't this mean it wont be part of the regular snapshots? - From what I understood things in subvolumes aren't subject to snapshots users take with snapper.
Ahh, I don't have an openSUSE install handy at the moment. Oops. Yes if the images are contained in a subvolume other than the top level of the file system, then top level snapshots done by snapper do not recursively snapshot the contents of nested subvolumes. But what does still happen is VM images cause data chunks to be allocated faster than metadata chunks as the VM image grows. So it's possible to fill up a volume with a high data to metadata chunk ratio that isn't released even if most of the VM's are deleted. You'd have a lot of nearly empty data chunks but can still have totally full metadata chunks and end up with ENOSPC. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org