http://bugzilla.suse.com/show_bug.cgi?id=1160019 Bug ID: 1160019 Summary: "Metadata corruption detected at xfs_inode_buf_verify" Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.1 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: martin.wilck@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- This happened on a fresh Leap15.1 installation on a laptop, SATA SSD, /home file system. Kernel 4.12.14-lp151.28.36-default.
Dec 28 19:02:47 pallas kernel: XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x72/0xf0 [xfs], xfs_inode block 0x4b860 Dec 28 19:02:47 pallas kernel: XFS (dm-0): Unmount and run xfs_repair Dec 28 19:02:47 pallas kernel: XFS (dm-0): First 64 bytes of corrupted metadata buffer: Dec 28 19:02:47 pallas kernel: ffff8801f164f000: 49 4e 41 ed 03 01 00 00 00 00 04 58 00 00 00 64 INA........X...d Dec 28 19:02:47 pallas kernel: ffff8801f164f010: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 ................ Dec 28 19:02:47 pallas kernel: ffff8801f164f020: 5e 01 34 bd 25 b4 38 63 5d ff e7 a6 01 cc 26 9e ^.4.%.8c].....&. Dec 28 19:02:47 pallas kernel: ffff8801f164f030: 5d ff e7 a6 01 cc 26 9e 00 00 00 00 00 00 00 34 ].....&........4
The same message is repeated a few times, eventually the FS is mounted r/o, and all kinds of weird failures result. AFAICS in the code in fs/xfs/libxfs/xfs_inode_buf.c, the 15.1 kernel tests nothing but the magic number (byte 0-1, 49 4e) and version (byte 4, 03), so the pattern dumped should be correct. But this is only the 1st inode in the buffer, so subsequent inodes might be bad. However I dumped a full 4k block att offset 0x4b860 and all inode headers seem to be ok so that far. The FS couldn't be umounted, so I rebooted into single-user mode, umounted /home, and ran xfs_repair, which showed nothing of interest:
Phase 1 - find and verify superblock... - block cache size set to 374008 entries Phase 2 - using internal log - zero log...u zero_log: head block 55736 tail block 55736 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 2 - agno = 1 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts...
XFS_REPAIR Summary Sat Dec 28 21:02:45 2019
Phase Start End Duration Phase 1: 12/28 21:02:44 12/28 21:02:45 1 second Phase 2: 12/28 21:02:45 12/28 21:02:45 Phase 3: 12/28 21:02:45 12/28 21:02:45 Phase 4: 12/28 21:02:45 12/28 21:02:45 Phase 5: 12/28 21:02:45 12/28 21:02:45 Phase 6: 12/28 21:02:45 12/28 21:02:45 Phase 7: 12/28 21:02:45 12/28 21:02:45
Total run time: 1 second done
A very similar was observed some days later after only a couple of minutes of uptime.
Jan 02 10:35:10 pallas.mittagstun.de kernel: XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x72/0xf0 [xfs], xfs_inode block 0x746b180 Jan 02 10:35:10 pallas.mittagstun.de kernel: XFS (dm-0): Unmount and run xfs_repair Jan 02 10:35:10 pallas.mittagstun.de kernel: XFS (dm-0): First 64 bytes of corrupted metadata buffer: Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d000: 49 4e 81 a4 03 02 00 00 00 00 04 58 00 00 00 64 IN.........X...d Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d020: 5e 07 ba 06 1c 4b f6 68 5e 07 ba 04 18 7a e7 4b ^....K.h^....z.K Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d030: 5e 07 ba 04 18 7a e7 4b 00 00 00 00 00 00 09 e8 ^....z.K........
I wonder if this is an indication of corrupt hardware, and if yes, what (SSD? CPU? Memory? Other?). The SSD in the laptop is brand new; the respective statement by the vendor is corroborated by SMART data (power-on hours). I have run a simple "surface test" (write pattern & verify) on the affected logical volume with fio, and found no errors (only a single test pass thus far). Perhaps noteworthy, I saw also some BTRFS checksum errors in the logs (from root file system):
Dec 28 19:04:17 pallas kernel: BTRFS warning (device sda2): sda2 checksum verify failed on 158384128 wanted D1502CDD found 7F3B5CF3 level 0
Jan 02 10:18:00 pallas.mittagstun.de kernel: BTRFS warning (device sda2): csum failed root 267 ino 232862 off 16384 csum 0xed735e63 expected csum 0x7e1dece1 mirror 1
Jan 02 10:28:27 pallas.mittagstun.de kernel: BTRFS warning (device sda2): csum failed root 267 ino 248644 off 0 csum 0x347357c0 expected csum 0x2e8516b0 mirror 1
-- You are receiving this mail because: You are on the CC list for the bug.