[Bug 1218864] New: grub2: error messages: 'not a correct XFS inode' shown many many times
https://bugzilla.suse.com/show_bug.cgi?id=1218864 Bug ID: 1218864 Summary: grub2: error messages: 'not a correct XFS inode' shown many many times Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Other Assignee: screening-team-bugs@suse.de Reporter: comes@naic.edu QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Do a fresh tumbleweed installation and select xfs as root filesystem. The installation completes successfully and the system reboot. After login run: update-bootloader --reinit and reboot. The system still reboots fine. Login and run again: update-bootloader --reinit Now after reboot before the grub menu appears you can see the following line printer many times: error: ../../grub-core/fs/xfs.c:541:not a correct XFS inode. This goes on for about a minute and then the boot porcess continues normally. Login and run again: update-bootloader --reinit After reboot before the grub menu appears you will see the previous error line printer many times but this time for about a couple of minutes and then the boot process continues normally. Running again: update-bootloader --reinit does not seem to extend the delay during which the error message is printed. I found this fedora bug report that describes the same issue: https://bugzilla.redhat.com/show_bug.cgi?id=2254370 and on comment 48 it is said that the following commit: https://github.com/rhboot/grub2/commit/1955d781ba20c1952adc820f3e587878b4f55... of grub2 1.12 is causing the issue. To test I have rebuild grub2 removing such commit from the source, I have installed the modified version of grub2 and run again: update-bootloader --reinit Now the system boots again without showing the error. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c2 --- Comment #2 from Giacomo Comes <comes@naic.edu> --- Just to be more clear, the troubling commit is not one backported in openSUSE, it is one present upstream in grub 1.12: commit: 07318ee7e11a00b9c1dea4c6b4edf62af35a511a fs/xfs: Fix XFS directory extent parsing -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c4 --- Comment #4 from Michael Chang <mchang@suse.com> --- Hi Anthony and Giacomo, Thanks for the heads up. I have created SR to factory to temporarily revert the commit : 07318ee7e fs/xfs: Fix XFS directory extent parsing: https://build.opensuse.org/request/show/1139339 It is a bit strange that we didn't run into the issue earlier given we had it backported to 2.12~rc1 a few months ago. I'll monitor the mailing list for any relevant discussions in the coming days. If there's none, I'll initiate one. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c6 --- Comment #6 from Giacomo Comes <comes@naic.edu> --- Created attachment 871939 --> https://bugzilla.suse.com/attachment.cgi?id=871939&action=edit metadump1 metadump of the XFS partition after running one time update-bootloader --reinit non error when booting -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c7 --- Comment #7 from Giacomo Comes <comes@naic.edu> --- Created attachment 871940 --> https://bugzilla.suse.com/attachment.cgi?id=871940&action=edit metadump2 metadump of the XFS partition after running a second time update-bootloader --reinit one minute delay error when booting -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c8 --- Comment #8 from Giacomo Comes <comes@naic.edu> --- Created attachment 871941 --> https://bugzilla.suse.com/attachment.cgi?id=871941&action=edit metadump3 metadump of the XFS partition after running a third time update-bootloader --reinit two minutes delay error when booting -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c10 --- Comment #10 from Giacomo Comes <comes@naic.edu> --- I did the metadump after booting in Leap 15.5 and accessing the unmounted tumbleweed disk image. There was a warning message on stdout while using xfs_metadump about some unsupported feature of the disk image that were not compatible with the Leap 15.5 kernel. But I did not mount the disk image. Let me regenerate the metadump without using Leap 15.5 but only tumbleweed to access the disk image, this time I'll also add a metadump of the freshly installed tumbleweed disk image before running any update-bootloader --reinit. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c12 --- Comment #12 from Giacomo Comes <comes@naic.edu> --- Created attachment 871943 --> https://bugzilla.suse.com/attachment.cgi?id=871943&action=edit metadump0-try2 metadump of the XFS partition after fresh installation without running update-bootloader --reinit -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c13 --- Comment #13 from Giacomo Comes <comes@naic.edu> --- Created attachment 871944 --> https://bugzilla.suse.com/attachment.cgi?id=871944&action=edit metadump1-try2 metadump of the XFS partition after running one time update-bootloader --reinit non error when booting -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c14 --- Comment #14 from Giacomo Comes <comes@naic.edu> --- Created attachment 871945 --> https://bugzilla.suse.com/attachment.cgi?id=871945&action=edit metadump2-try2 metadump of the XFS partition after running a second time update-bootloader --reinit one minute delay error when booting -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c15 --- Comment #15 from Giacomo Comes <comes@naic.edu> --- Created attachment 871946 --> https://bugzilla.suse.com/attachment.cgi?id=871946&action=edit metadump3-try2 metadump of the XFS partition after running a third time update-bootloader --reinit two minutes delay error when booting -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c19 --- Comment #19 from Giacomo Comes <comes@naic.edu> --- I did a test using UEFI bios and I didn't see the grub error message. The issue appears to be related to legacy bios. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c20 --- Comment #20 from Giacomo Comes <comes@naic.edu> --- More info. The "corrupted partition", that is the one that causes the message "not a correct XFS inode" to be generated when it is booted from, can be mounted with groub2-mount without issue. I manage to enable debug with the command: grub2-editenv - set debug=xfs then booted and recorded the boot process. I have a 2:15 22MB video of the boot process. If interested, please let me know how to make it available to you. In such video it is possible to see: fs/xfs.c:1010:xfs: Reading sb fs/xfs.c:288:xfs: Validating superblock fs/xfs.c:300:xfs: XFS v5 superblock detected fs/xfs.c:1042:xfs: Reading root ino 128 fs/xfs.c:533:xfs: reading inode (128) - 128, 0 and then more inodes are read until you see the following: fs/xfs.c:533:xfs: Reading inode (0) - 0, 0 error: ../../grub-core/fs/xfs.c:541:not a correct XFS inode. Later in the boot process you can also see: fs/xfs.c:533:xfs: Reading inode (25942926) - 16491888, 3072 fs/xfs.c:533:xfs: Reading inode (25942927) - 16491888, 3584 fs/xfs.c:533:xfs: Reading inode (197568495624) - 123371593736, 0 error: ../../grub-core/kern/disk_common.c:26:attempt to read or write outside of partition. So it looks like the code that decides which inode to read next have some glitches. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c23 --- Comment #23 from Giacomo Comes <comes@naic.edu> --- (In reply to Anthony Iliopoulos from comment #21)
Is this with 07318ee7e reverted? Does grub actually manage to boot the kernel after those errors?
No this is with 07318ee7e. If reverted, there is no error message. For the time being I'm keeping around the faulty grub package for testing purpose. And yes, at the end grub boots the kernel. I only saw the message "attempt to read or write outside of partition" in the recorded video with debug enabled, but it may well be that it is present also without enabling debug and I just missed it because of the fast scrolling of the error messages. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c28 --- Comment #28 from Giacomo Comes <comes@naic.edu> --- The original author of the patch causing the current problem has made some comment on the fedora bugzilla entry. See comment 57,59,61. I think what he says is consistent with that Michael found. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c29 Lorenz Hüdepohl <dev@stellardeath.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dev@stellardeath.org --- Comment #29 from Lorenz Hüdepohl <dev@stellardeath.org> --- To chime in as a third party: On my tumbleweed installation with the latest change, I can no longer boot kernels from my root XFS. I had problems before that were fixed in https://build.opensuse.org/request/show/1138021 already before I could report them :) But with the newest https://build.opensuse.org/request/show/1138021 it no longer works, again. (A compounding factor seems to be the move of the kernels from /boot to /usr/lib/modules/$kernelversion, as the rather small /boot directory can always be parsed by grub in my tests so far, whereas the rather large /usr metadata is probably stored somehow differently by XFS, triggering this issue. This could blur the timeline of events, somewhat, not sure what came when) Here are details about the currently installed package (which is not able to boot): root@~> rpm -q grub2 --changelog | head * Mi Jan 17 2024 Michael Chang <mchang@suse.com> - Resolved XFS regression leading to the "not a correct XFS inode" error by temporarily reverting the problematic commit (bsc#1218864) * 0001-Revert-fs-xfs-Fix-XFS-directory-extent-parsing.patch [...] root@~> rpm -qi grub2 Name : grub2 Version : 2.12 Release : 2.1 Architecture: x86_64 Install Date: Sa 20 Jan 2024 09:23:11 CET Group : System/Boot Size : 27941722 License : GPL-3.0-or-later Signature : RSA/SHA512, Mi 17 Jan 2024 23:54:16 CET, Key ID 35a2f86e29b700a4 Source RPM : grub2-2.12-2.1.src.rpm Build Date : Mi 17 Jan 2024 23:53:48 CET Build Host : i04-ch1d Packager : https://bugs.opensuse.org Vendor : openSUSE URL : http://www.gnu.org/software/grub/ Summary : Bootloader with support for Linux, Multiboot and more Description : This is the second version of the GRUB (Grand Unified Bootloader), a highly configurable and customizable bootloader with modular architecture. It support rich scale of kernel formats, file systems, computer architectures and hardware devices. This package includes user space utlities to manage GRUB on your system. Distribution: openSUSE Tumbleweed -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c33 --- Comment #33 from Lorenz Hüdepohl <dev@stellardeath.org> --- (In reply to Michael Chang from comment #32)
The obs repository containing the test package:
https://download.opensuse.org/repositories/home:/michael-chang:/bsc:/1218864... standard/
For openSUSE Tumbleweed, you may follow the procedure to install the test package:
zypper ar --repo https://download.opensuse.org/repositories/home:/michael-chang:/bsc:/1218864... zypper ref zypper dup --from home_michael-chang_bsc_1218864 --allow-vendor-change
And this procedure to remove the test package:
zypper rr home_michael-chang_bsc_1218864 zypper dup --allow-vendor-change
For uefi, you may have to disable secure boot because the test grub.efi is not signed by SUSE. Optionally it is possible to keep secure boot on, but you have entrust the key of my home project by enrolling it into "Machine Owner Key" (MOK) keyring.
mokutil --import /usr/share/efi/x86_64/grub.der
Thanks.
Thank you for the detailed and easy to follow instructions: With the package in your branch the booting from XFS works, again! Kind regards, Lorenz -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c34 --- Comment #34 from Giacomo Comes <comes@naic.edu> --- For me too the patch fix the problem. I don't see anymore the error message at boot. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218864 https://bugzilla.suse.com/show_bug.cgi?id=1218864#c38 --- Comment #38 from Giacomo Comes <comes@naic.edu> --- I tested the patch. It works for me. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com