New subject: [Bug 1160019] "Metadata corruption detected at xfs_inode_buf_verify"

2 Jan 2020

      http://bugzilla.suse.com/show_bug.cgi?id=1160019

            Bug ID: 1160019
           Summary: "Metadata corruption detected at xfs_inode_buf_verify"
    Classification: openSUSE
           Product: openSUSE Distribution
           Version: Leap 15.1
          Hardware: Other
                OS: Other
            Status: NEW
          Severity: Normal
          Priority: P5 - None
         Component: Kernel
          Assignee: kernel-maintainers@forge.provo.novell.com
          Reporter: martin.wilck@suse.com
        QA Contact: qa-bugs@suse.de
          Found By: ---
           Blocker: ---

This happened on a fresh Leap15.1 installation on a laptop, SATA SSD, /home
file system. Kernel 4.12.14-lp151.28.36-default.
...
Dec 28 19:02:47 pallas kernel: XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x72/0xf0 [xfs], xfs_inode block 0x4b860
Dec 28 19:02:47 pallas kernel: XFS (dm-0): Unmount and run xfs_repair
Dec 28 19:02:47 pallas kernel: XFS (dm-0): First 64 bytes of corrupted metadata buffer:
Dec 28 19:02:47 pallas kernel: ffff8801f164f000: 49 4e 41 ed 03 01 00 00 00 00 04 58 00 00 00 64  INA........X...d
Dec 28 19:02:47 pallas kernel: ffff8801f164f010: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00  ................
Dec 28 19:02:47 pallas kernel: ffff8801f164f020: 5e 01 34 bd 25 b4 38 63 5d ff e7 a6 01 cc 26 9e  ^.4.%.8c].....&.
Dec 28 19:02:47 pallas kernel: ffff8801f164f030: 5d ff e7 a6 01 cc 26 9e 00 00 00 00 00 00 00 34  ].....&........4
The same message is repeated a few times, eventually the FS is mounted r/o, and
all kinds of weird failures result.

AFAICS in the code in fs/xfs/libxfs/xfs_inode_buf.c, the 15.1 kernel tests
nothing but the magic number (byte 0-1, 49 4e) and version (byte 4, 03), so the
pattern dumped should be correct. But this is only the 1st inode in the buffer,
so subsequent inodes might be bad. However I dumped a full 4k block att offset
0x4b860 and all inode headers seem to be ok so that far.

The FS couldn't be umounted, so I rebooted into single-user mode, umounted
/home, and ran xfs_repair, which showed nothing of interest:
...
Phase 1 - find and verify superblock...
        - block cache size set to 374008 entries
Phase 2 - using internal log
        - zero log...u
zero_log: head block 55736 tail block 55736
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 2
        - agno = 1
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
XFS_REPAIR Summary    Sat Dec 28 21:02:45 2019
Phase           Start           End             Duration
Phase 1:        12/28 21:02:44  12/28 21:02:45  1 second
Phase 2:        12/28 21:02:45  12/28 21:02:45  
Phase 3:        12/28 21:02:45  12/28 21:02:45  
Phase 4:        12/28 21:02:45  12/28 21:02:45  
Phase 5:        12/28 21:02:45  12/28 21:02:45  
Phase 6:        12/28 21:02:45  12/28 21:02:45  
Phase 7:        12/28 21:02:45  12/28 21:02:45
Total run time: 1 second
done
A very similar was observed some days later after only a couple of minutes of
uptime.
...
Jan 02 10:35:10 pallas.mittagstun.de kernel: XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x72/0xf0 [xfs], xfs_inode block 0x746b180
Jan 02 10:35:10 pallas.mittagstun.de kernel: XFS (dm-0): Unmount and run xfs_repair
Jan 02 10:35:10 pallas.mittagstun.de kernel: XFS (dm-0): First 64 bytes of corrupted metadata buffer:
Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d000: 49 4e 81 a4 03 02 00 00 00 00 04 58 00 00 00 64  IN.........X...d
Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d020: 5e 07 ba 06 1c 4b f6 68 5e 07 ba 04 18 7a e7 4b  ^....K.h^....z.K
Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d030: 5e 07 ba 04 18 7a e7 4b 00 00 00 00 00 00 09 e8  ^....z.K........
I wonder if this is an indication of corrupt hardware, and if yes, what (SSD?
CPU? Memory? Other?).

The SSD in the laptop is brand new; the respective statement by the vendor is
corroborated by SMART data (power-on hours). I have run a simple "surface test"
(write pattern & verify) on the affected logical volume with fio, and found no
errors (only a single test pass thus far).

Perhaps noteworthy, I saw also some BTRFS checksum errors in the logs (from
root file system):
...
Dec 28 19:04:17 pallas kernel: BTRFS warning (device sda2): sda2 checksum verify failed on 158384128 wanted D1502CDD found 7F3B5CF3 level 0
Jan 02 10:18:00 pallas.mittagstun.de kernel: BTRFS warning (device sda2): csum failed root 267 ino 232862 off 16384 csum 0xed735e63 expected csum 0x7e1dece1 mirror 1
Jan 02 10:28:27 pallas.mittagstun.de kernel: BTRFS warning (device sda2): csum failed root 267 ino 248644 off 0 csum 0x347357c0 expected csum 0x2e8516b0 mirror 1
-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 1160019] New: "Metadata corruption detected at xfs_inode_buf_verify"

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠suse.com

tags

participants (2)