Stick OSS 13.2 machine running kernel 3.16.7-21-default. Machine has
four SATA drives of 500 GB each. All drives are partitioned identically
with 3 partitions each.
First partition is 1G of type BIOS Boot.
Second partition is 2 G of type linux raid.
Third partition is of 463G of type linux raid.
Second partitions of all drives are setup as a RAID5 md device md0 and
used as swap.
Third partitions of all drives are setup as a RAID5 md device md1 and
mount as / (the root filesystem).
Machine has been running fine for some time, doing much of nothing other
than daily zypper updates etc.
A few days ago, I copied massive amounts of data to /data/save1 (unsure
of total amount, copied close to 1.1 TB, then deleted about 500 MB, then
copied about 600 or so MG again). The copies were done with rsync and
completed with no errors whatsoever.
Once the copies completed, I decided to a btrfs balance (btrfs balance
start /)- and that is where the fun began.
btrfs kept failing with messages like below in the log (below is a grep
'btrfs' from journalctl -b so the messages are not necessarily contiguous):
[44950.935025] BTRFS info (device md1): relocating block group
3706217037824 flags 1
[44954.665729] BTRFS info (device md1): relocating block group
3705143296000 flags 1
[44966.689827] BTRFS info (device md1): found 156 extents
[44985.223754] BTRFS: bdev /dev/md1 errs: wr 9911, rd 0, flush 0,
corrupt 0, gen 0
[44985.224033] BTRFS: bdev /dev/md1 errs: wr 9912, rd 0, flush 0,
corrupt 0, gen 0
[44985.224285] BTRFS: bdev /dev/md1 errs: wr 9913, rd 0, flush 0,
corrupt 0, gen 0
Running a btrfs scrub / showed no errors or issues at all.
Then I started doing btrfs balance start -dusage=X
and noticed that it would succeed up to X = 73 and fail after that.
Then I started deleting subsets of data. I did btrfs balance start
-dusage=X after each delete and noticed that as more and more data was
deleted, X kept increasing before the balance would error out.
Now, 4th day of the saga (slow system!) I have btrfs balance running
successfully with -dusage=95 but still failing with error similar to
above with -dusage=100. Do I need to keep on deleting data and doing
the balance until balance succeeds with -dusage=100 ?
Above, while more or less a test (the data exists in the source still so
I am not too worried about losing from this btrfs file system), has
scared me a bit about using btrfs for truly production data.
Does anyone know what the btrfs errors in the log mean? I am assuming
it has something to do with the requirement of doing a btrfs balance
every so often. I know that btrfs has issues when at high usage
capacity, and one has to balance etc. However, this is the first time I
am seeing it, and knowing it happened without the data copy failing with
a disk full error or anything of that nature has left me with the
impression that btrfs is not suitable for production use. I would much
rather have had my data copy fail with some disk full type error or what
have you than run into this apparent time bomb.
Other than the balance errors, everything else is fine on the
filesystem, I can create files, delete files, system reboots fine (even
though this is /).
Thanks and I eagerly await comments from others who have seen this.
--
--Moby
They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -- Benjamin Franklin
--
To unsubscribe, e-mail: opensuse+unsubscribe(a)opensuse.org
To contact the owner, e-mail: opensuse+owner(a)opensuse.org