Bug ID | 1189791 |
---|---|
Summary | btrfs filesystem corruptions with HyperPAV |
Classification | openSUSE |
Product | openSUSE Tumbleweed |
Version | Current |
Hardware | S/390-64 |
OS | Other |
Status | NEW |
Severity | Normal |
Priority | P5 - None |
Component | Kernel |
Assignee | kernel-bugs@opensuse.org |
Reporter | azouhr@opensuse.org |
QA Contact | qa-bugs@suse.de |
CC | ada.lovelace@gmx.de |
Found By | --- |
Blocker | --- |
For a proof of concept, I created a machine with 22 3390-54 Disks attached to a single btrfs, cylinder 0 excluded as minidisks. This happened in certain junks of disks (at least 4 at a time). After adding devices, I always did a rebalance of the btrfs. When I added disk 15-18, I also enabled hyperpav (without cylinder 0 on the base devices) as available as feature for z/VM 7.2 (feature also has been added to z/VM). The definition looks like this in the directory: COMMAND DEFINE HYPERPAVALIAS 01C0 FOR BASE 0100 I also added alias devices at addresses 01C1-01C7. However, later on when copying data from that disk, I found quite a number of corruptions within btrfs. After a while the server even crashed, and is now running without hyperpav. Corruptions that already happened are obviously still there, but besides that, the system works without the hangs I experienced before. By chance (don't think that would be the issue), the first disk with corruptions is the 16th disk in the system (including the system disk). And while thinking about this, the disk with number 100 has not been enabled on the system, although all disks are defined to the same control unit. HyperPAV has been used during rebalance of the last 8 disks because dasdstat displayed workload on the PAV device. # btrfs device stats /srv | grep corruption_errs [/dev/dasdb1].corruption_errs 0 [/dev/dasdc1].corruption_errs 0 [/dev/dasdd1].corruption_errs 0 [/dev/dasde1].corruption_errs 0 [/dev/dasdk1].corruption_errs 0 [/dev/dasdj1].corruption_errs 0 [/dev/dasdh1].corruption_errs 0 [/dev/dasdm1].corruption_errs 0 [/dev/dasdf1].corruption_errs 0 [/dev/dasdg1].corruption_errs 0 [/dev/dasdl1].corruption_errs 0 [/dev/dasdi1].corruption_errs 0 [/dev/dasdn1].corruption_errs 0 [/dev/dasdo1].corruption_errs 0 [/dev/dasdp1].corruption_errs 331 [/dev/dasdq1].corruption_errs 303 [/dev/dasds1].corruption_errs 302 [/dev/dasdv1].corruption_errs 399 [/dev/dasdr1].corruption_errs 356 [/dev/dasdt1].corruption_errs 206 [/dev/dasdw1].corruption_errs 279 [/dev/dasdu1].corruption_errs 218 # dmesg | tail [ 6616.004750] BTRFS warning (device dasdb1): csum failed root 5 ino 40234 off 1818624 csum 0x8941f998 expected csum 0x99cae683 mirror 1 [ 6616.004755] BTRFS error (device dasdb1): bdev /dev/dasdv1 errs: wr 0, rd 0, flush 0, corrupt 401, gen 0 [ 6616.005008] BTRFS warning (device dasdb1): csum failed root 5 ino 40234 off 3637248 csum 0x8941f998 expected csum 0xe55a18c7 mirror 1 [ 6616.005013] BTRFS error (device dasdb1): bdev /dev/dasdv1 errs: wr 0, rd 0, flush 0, corrupt 402, gen 0 [ 6616.005238] BTRFS warning (device dasdb1): csum failed root 5 ino 40234 off 1818624 csum 0x8941f998 expected csum 0x99cae683 mirror 1 [ 6616.005244] BTRFS error (device dasdb1): bdev /dev/dasdv1 errs: wr 0, rd 0, flush 0, corrupt 403, gen 0 [ 6616.005513] BTRFS warning (device dasdb1): csum failed root 5 ino 40234 off 1818624 csum 0x8941f998 expected csum 0x99cae683 mirror 1 [ 6616.005565] BTRFS error (device dasdb1): bdev /dev/dasdv1 errs: wr 0, rd 0, flush 0, corrupt 404, gen 0 [ 6616.882699] BTRFS warning (device dasdb1): csum failed root 5 ino 40224 off 10055680 csum 0x8941f998 expected csum 0x26222530 mirror 1 [ 6616.882715] BTRFS error (device dasdb1): bdev /dev/dasdu1 errs: wr 0, rd 0, flush 0, corrupt 249, gen 0 # uname -a Linux zlxusr1020 5.12.12-1-default #1 SMP Fri Jun 18 11:07:46 UTC 2021 (0e46a2c) s390x s390x s390x GNU/Linux