Bug ID 1189791
Summary btrfs filesystem corruptions with HyperPAV
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware S/390-64
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-bugs@opensuse.org
Reporter azouhr@opensuse.org
QA Contact qa-bugs@suse.de
CC ada.lovelace@gmx.de
Found By ---
Blocker ---

For a proof of concept, I created a machine with 22 3390-54 Disks attached to a
single btrfs, cylinder 0 excluded as minidisks. This happened in certain junks
of disks (at least 4 at a time). After adding devices, I always did a rebalance
of the btrfs.

When I added disk 15-18, I also enabled hyperpav (without cylinder 0 on the
base devices) as available as feature for z/VM 7.2 (feature also has been added
to z/VM). The definition looks like this in the directory:

COMMAND DEFINE HYPERPAVALIAS 01C0 FOR BASE 0100

I also added alias devices at addresses 01C1-01C7.

However, later on when copying data from that disk, I found quite a number of
corruptions within btrfs. After a while the server even crashed, and is now
running without hyperpav. Corruptions that already happened are obviously still
there, but besides that, the system works without the hangs I experienced
before.

By chance (don't think that would be the issue), the first disk with
corruptions is the 16th disk in the system (including the system disk). And
while thinking about this, the disk with number 100 has not been enabled on the
system, although all disks are defined to the same control unit. HyperPAV has
been used during rebalance of the last 8 disks because dasdstat displayed
workload on the PAV device.

# btrfs device stats /srv | grep corruption_errs
[/dev/dasdb1].corruption_errs  0
[/dev/dasdc1].corruption_errs  0
[/dev/dasdd1].corruption_errs  0
[/dev/dasde1].corruption_errs  0
[/dev/dasdk1].corruption_errs  0
[/dev/dasdj1].corruption_errs  0
[/dev/dasdh1].corruption_errs  0
[/dev/dasdm1].corruption_errs  0
[/dev/dasdf1].corruption_errs  0
[/dev/dasdg1].corruption_errs  0
[/dev/dasdl1].corruption_errs  0
[/dev/dasdi1].corruption_errs  0
[/dev/dasdn1].corruption_errs  0
[/dev/dasdo1].corruption_errs  0
[/dev/dasdp1].corruption_errs  331
[/dev/dasdq1].corruption_errs  303
[/dev/dasds1].corruption_errs  302
[/dev/dasdv1].corruption_errs  399
[/dev/dasdr1].corruption_errs  356
[/dev/dasdt1].corruption_errs  206
[/dev/dasdw1].corruption_errs  279
[/dev/dasdu1].corruption_errs  218

# dmesg | tail
[ 6616.004750] BTRFS warning (device dasdb1): csum failed root 5 ino 40234 off
1818624 csum 0x8941f998 expected csum 0x99cae683 mirror 1
[ 6616.004755] BTRFS error (device dasdb1): bdev /dev/dasdv1 errs: wr 0, rd 0,
flush 0, corrupt 401, gen 0
[ 6616.005008] BTRFS warning (device dasdb1): csum failed root 5 ino 40234 off
3637248 csum 0x8941f998 expected csum 0xe55a18c7 mirror 1
[ 6616.005013] BTRFS error (device dasdb1): bdev /dev/dasdv1 errs: wr 0, rd 0,
flush 0, corrupt 402, gen 0
[ 6616.005238] BTRFS warning (device dasdb1): csum failed root 5 ino 40234 off
1818624 csum 0x8941f998 expected csum 0x99cae683 mirror 1
[ 6616.005244] BTRFS error (device dasdb1): bdev /dev/dasdv1 errs: wr 0, rd 0,
flush 0, corrupt 403, gen 0
[ 6616.005513] BTRFS warning (device dasdb1): csum failed root 5 ino 40234 off
1818624 csum 0x8941f998 expected csum 0x99cae683 mirror 1
[ 6616.005565] BTRFS error (device dasdb1): bdev /dev/dasdv1 errs: wr 0, rd 0,
flush 0, corrupt 404, gen 0
[ 6616.882699] BTRFS warning (device dasdb1): csum failed root 5 ino 40224 off
10055680 csum 0x8941f998 expected csum 0x26222530 mirror 1
[ 6616.882715] BTRFS error (device dasdb1): bdev /dev/dasdu1 errs: wr 0, rd 0,
flush 0, corrupt 249, gen 0

# uname -a
Linux zlxusr1020 5.12.12-1-default #1 SMP Fri Jun 18 11:07:46 UTC 2021
(0e46a2c) s390x s390x s390x GNU/Linux


You are receiving this mail because: