Bug ID | 1160019 |
---|---|
Summary | "Metadata corruption detected at xfs_inode_buf_verify" |
Classification | openSUSE |
Product | openSUSE Distribution |
Version | Leap 15.1 |
Hardware | Other |
OS | Other |
Status | NEW |
Severity | Normal |
Priority | P5 - None |
Component | Kernel |
Assignee | kernel-maintainers@forge.provo.novell.com |
Reporter | martin.wilck@suse.com |
QA Contact | qa-bugs@suse.de |
Found By | --- |
Blocker | --- |
This happened on a fresh Leap15.1 installation on a laptop, SATA SSD, /home file system. Kernel 4.12.14-lp151.28.36-default. > Dec 28 19:02:47 pallas kernel: XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x72/0xf0 [xfs], xfs_inode block 0x4b860 > Dec 28 19:02:47 pallas kernel: XFS (dm-0): Unmount and run xfs_repair > Dec 28 19:02:47 pallas kernel: XFS (dm-0): First 64 bytes of corrupted metadata buffer: > Dec 28 19:02:47 pallas kernel: ffff8801f164f000: 49 4e 41 ed 03 01 00 00 00 00 04 58 00 00 00 64 INA........X...d > Dec 28 19:02:47 pallas kernel: ffff8801f164f010: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 ................ > Dec 28 19:02:47 pallas kernel: ffff8801f164f020: 5e 01 34 bd 25 b4 38 63 5d ff e7 a6 01 cc 26 9e ^.4.%.8c].....&. > Dec 28 19:02:47 pallas kernel: ffff8801f164f030: 5d ff e7 a6 01 cc 26 9e 00 00 00 00 00 00 00 34 ].....&........4 The same message is repeated a few times, eventually the FS is mounted r/o, and all kinds of weird failures result. AFAICS in the code in fs/xfs/libxfs/xfs_inode_buf.c, the 15.1 kernel tests nothing but the magic number (byte 0-1, 49 4e) and version (byte 4, 03), so the pattern dumped should be correct. But this is only the 1st inode in the buffer, so subsequent inodes might be bad. However I dumped a full 4k block att offset 0x4b860 and all inode headers seem to be ok so that far. The FS couldn't be umounted, so I rebooted into single-user mode, umounted /home, and ran xfs_repair, which showed nothing of interest: > Phase 1 - find and verify superblock... > - block cache size set to 374008 entries > Phase 2 - using internal log > - zero log...u > zero_log: head block 55736 tail block 55736 > - scan filesystem freespace and inode maps... > - found root inode chunk > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > - process newly discovered inodes... > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 0 > - agno = 3 > - agno = 2 > - agno = 1 > Phase 5 - rebuild AG headers and trees... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > - reset superblock... > Phase 6 - check inode connectivity... > - resetting contents of realtime bitmap and summary inodes > - traversing filesystem ... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > - traversal finished ... > - moving disconnected inodes to lost+found ... > Phase 7 - verify and correct link counts... > > XFS_REPAIR Summary Sat Dec 28 21:02:45 2019 > > Phase Start End Duration > Phase 1: 12/28 21:02:44 12/28 21:02:45 1 second > Phase 2: 12/28 21:02:45 12/28 21:02:45 > Phase 3: 12/28 21:02:45 12/28 21:02:45 > Phase 4: 12/28 21:02:45 12/28 21:02:45 > Phase 5: 12/28 21:02:45 12/28 21:02:45 > Phase 6: 12/28 21:02:45 12/28 21:02:45 > Phase 7: 12/28 21:02:45 12/28 21:02:45 > > Total run time: 1 second > done A very similar was observed some days later after only a couple of minutes of uptime. > Jan 02 10:35:10 pallas.mittagstun.de kernel: XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x72/0xf0 [xfs], xfs_inode block 0x746b180 > Jan 02 10:35:10 pallas.mittagstun.de kernel: XFS (dm-0): Unmount and run xfs_repair > Jan 02 10:35:10 pallas.mittagstun.de kernel: XFS (dm-0): First 64 bytes of corrupted metadata buffer: > Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d000: 49 4e 81 a4 03 02 00 00 00 00 04 58 00 00 00 64 IN.........X...d > Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ > Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d020: 5e 07 ba 06 1c 4b f6 68 5e 07 ba 04 18 7a e7 4b ^....K.h^....z.K > Jan 02 10:35:11 pallas.mittagstun.de kernel: ffff8801dd24d030: 5e 07 ba 04 18 7a e7 4b 00 00 00 00 00 00 09 e8 ^....z.K........ I wonder if this is an indication of corrupt hardware, and if yes, what (SSD? CPU? Memory? Other?). The SSD in the laptop is brand new; the respective statement by the vendor is corroborated by SMART data (power-on hours). I have run a simple "surface test" (write pattern & verify) on the affected logical volume with fio, and found no errors (only a single test pass thus far). Perhaps noteworthy, I saw also some BTRFS checksum errors in the logs (from root file system): > Dec 28 19:04:17 pallas kernel: BTRFS warning (device sda2): sda2 checksum verify failed on 158384128 wanted D1502CDD found 7F3B5CF3 level 0 > > Jan 02 10:18:00 pallas.mittagstun.de kernel: BTRFS warning (device sda2): csum failed root 267 ino 232862 off 16384 csum 0xed735e63 expected csum 0x7e1dece1 mirror 1 > > Jan 02 10:28:27 pallas.mittagstun.de kernel: BTRFS warning (device sda2): csum failed root 267 ino 248644 off 0 csum 0x347357c0 expected csum 0x2e8516b0 mirror 1