Comment # 1 on bug 1191954 from
Zygo provided some background on Btrfs' fiemap behaviour in a recent upstream
mailing list thread
https://lore.kernel.org/all/YhMzKeX3FvJPvtmR@hungrycats.org/ :

FIEMAP's output cannot correctly represent btrfs compressed data.
In some cases you may be able to identify logical blocks as belonging
to the same underlying compressed extent, but not with enough precision
to infer data content of the blocks.

The physical location of a compressed byte is a two-dimensional
quantity--one to identify the physical compressed extent, one to identify
the byte's offset within the decompressed data.  The length is similarly
two-dimensional, one for the physical size and one for the logical size.
Since compressed bytes are a different size unit than uncompressed bytes,
we can't add a compressed offset or length to a physical position and
get a number that isn't garbage, so we can't fill in distinct values
for physical location of compressed data blocks that make numerical sense.

Try 'btrfs-search-metadata file' (from the python-btrfs package) for
an accurate description of what's going on with the extent references.
It uses TREE_SEARCH_V2 and the underlying btrfs file extent reference
structure, which has the fields that FIEMAP is missing.

Underneath, the compressed extent is an immutable contiguous region of
storage, identified by the bytenr (virtual address) of the first byte
of the storage.  Each reference to the extent in the file refers to a
contiguous range of the extent's logical blocks (after decompression).
The fields are, in no particular order:

    1. the logical offset within the file (seek offset) where
    the referenced data appears in the file

    2. the extent bytenr (extent identifier for reference counting
    and backref search, first physical byte of the extent)

    3. the logical length of the referenced data (the portion
    of the compressed data referenced at this offset in the file)

    4. the logical offset within the extent where the referenced
    data begins (after decompressing the extent, where to start
    reading the data in memory)

    5. the physical (compressed) length of the complete extent data
    (how many bytes are used in physical storage)

    6. the logical (decompressed) length of the complete extent data
    (how much RAM is required to decompress the extent)

Only the first three of these fields are available via FIEMAP.  FIEMAP
provides only one length field, so it can't handle compressed extents
which have two distinct lengths.  FIEMAP provides only one integer for
physical position, so it can't handle references to blocks that are
not the first block in a compressed extent.

TREE_SEARCH_V2 provides all six fields, so you can get accurate logical or
physical extent boundary information as needed.

In simple write() cases, the offset fields are zero, so FIEMAP appears to
work at first:

    1. seek offset is some number, FIEMAP returns that number

    2. extent bytenr is the FIEMAP physical start of extent

    3. logical length of the referenced data (#3) is the same as
    the logical decompressed length (#6).  FIEMAP gives #3.
    This value will change if the extent is partially overwritten
    in the file.

    4. logical offset within the extent is 0, since the extent
    was created for exactly this file data reference

    5. physical length of the compressed extent isn't reported in
    FIEMAP.  Tools like 'filefrag -v' which try to compute extent
    boundary adjacency won't work--they will use the length in #3
    when they should use field #2 + #5 to compute physical extent
    end boundaries.

    6. logical length of the compressed extent is the same as #3.
    This value never changes until the extent is destroyed.

In the test case, FIEMAP reports the same number at #2 for all extents
since the same physical extent is referenced, but the referenced data
location is actually a function of fields #2 and #4.  The second and
third extents have non-zero offsets for #4, and the length at #3 becomes
different from the length at #6, making any computed values based on
these fields nonsense.


You are receiving this mail because: