[Bug 729667] New: grub error after each kernel-update: "Error 16: Inconsistent filesystem structure"
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c0 Summary: grub error after each kernel-update: "Error 16: Inconsistent filesystem structure" Classification: openSUSE Product: openSUSE 11.4 Version: Final Platform: i686 OS/Version: openSUSE 11.4 Status: NEW Severity: Critical Priority: P5 - None Component: Bootloader AssignedTo: jsrain@suse.com ReportedBy: Yarny@public-files.de QAContact: jsrain@suse.com Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1 Short summary: I'm running 4 32bit machines with openSUSE 11.4 and each time there is a kernel update, at least one or two of those (different ones each time) refuse to (re)boot afterwards. Grub shows its menu for a few seconds and then immediately says "Error 16: Inconsistent filesystem structure". On each machine, the boot partition is ~80MiB ext2. fsck.ext2 (from LiveCD) says everything's OK. Recreating the boot partition with mkfs.ext2 fixes the problem until the next update arrives. A bit more details. (This is mostly a copy from <URL:http://forums.opensuse.org/forums/english/get-technical-help-here/install-boot-login/467492-opensuse-11-4-grub-error-16-after-every-kernel-update.html>) Here's what I did/observed this Tuesday: * "zypper update", which (besides other stuff) listed kernel-default. I confirm. * After this is done, I did "shutdown -r now" to reboot. * Grub shows its menu (text mode only, I don't have the graphical bootsplash installed). After 3 sec or so timeout, it tries to boot as always, but immediately says "Error 16: Inconsistent filesystem structure". Unfortunately, this problem occurs only randomly. I'm running 4 64bit machines and 4 32bit machines. Up to now it happened only on 32bit machines and at every kernel-update, it occurs on 1 or 2 of my machines. To fix this, I have to boot from external media and recreate the broken boot-partition (i.e. copy files to some temporary place and do a "mkfs.ext2 /dev/sda1", then copy back on the new filesystem). Then I mount everything, chroot in my to-be-fixed system and run "grub-install". After that I can boot from harddisk again. The filesystem which is inconsistent according to grub is ok according to fsck.ext2. I checked this before rewriting the filesystem with mkfs.ext2. The system where it happened yesterday contains only one harddisk with two partitions: sda1 is /boot, ext2, 80MiB; sda2 is a luks-encrypted lvm container. Partition table is gpt, grub is installed in the mbr (I think). The way I installed these systems I a bit unorthodox: I basically set up the partitioning scheme by hand and mounted everything, then did a "zypper -R /mnt/root install ..". While this might contribute to the triggering of this behavior, it still believe there is an underlying bug that is causing this mess. I have no other explanation for this to happen only randomly, but not always. In the last months I always just "fixed" this as fast as possible to move on. On Tuesday I made a copy of /dev/sda1 ("cat /dev/sda1 > file") right after booting my rescue system. I also updated two other 32bit machines without problems. Another 32bit machine is still awaiting the kernel update (I did not "zypper update" yet). Sadly, I don't see an obvious way to reliably reproduce the problem (without waiting for the next kernel update). I can share my boot-partition-image with you which I created before fixing it, also log-files and other data. But since all 4 32bit machines are production systems my ability to do experimentation with them is limited. But if it would be helpful I can try to collect more data when updating my remaining not-updated machine. Reproducible: Sometimes -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c2 Torsten Duwe <duwe@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |Yarny@public-files.de --- Comment #2 from Torsten Duwe <duwe@suse.com> 2011-11-11 10:20:43 UTC --- Yes, please. The next time this happens, please attach a compressed image of those 80GB. If it was empty before and you use a good compressor, it should only be a dozen megabytes or so. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c3 --- Comment #3 from Yarny Yarny <Yarny@public-files.de> 2011-11-11 17:23:19 UTC --- Created an attachment (id=461764) --> (http://bugzilla.novell.com/attachment.cgi?id=461764) Image of sda1 (=/boot) after kernel update and grub failure (Part 1/7) I created this image this Tuesday with "cat /dev/sda1", just after booting into my rescue system (as described in my initial post). It's compressed with 'xz -9e', but my harddisk probably never got zeroed, so it's still about 64MiB. The sha1 of the uncompressed image is c2d7019c8bde07674400972743b0b3cce5239a3a. As said, one of my 32bit machines is still awaiting the update. Maybe it will go without any problems, but I can postpone the update a few more days and then try to collect data while updating... P.S. After 10 min upload, bugzilla complained about filesize (no more than 10MiB). So I split the xz file into 7 pieces. Sorry if this isn't a good idea. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c4 --- Comment #4 from Yarny Yarny <Yarny@public-files.de> 2011-11-11 17:26:56 UTC --- Created an attachment (id=461765) --> (http://bugzilla.novell.com/attachment.cgi?id=461765) Image of sda1 (=/boot) after kernel update and grub failure (Part 2/7) Part 2/7 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c5 --- Comment #5 from Yarny Yarny <Yarny@public-files.de> 2011-11-11 17:29:53 UTC --- Created an attachment (id=461766) --> (http://bugzilla.novell.com/attachment.cgi?id=461766) Image of sda1 (=/boot) after kernel update and grub failure (Part 3/7) Part 3/7 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c6 --- Comment #6 from Yarny Yarny <Yarny@public-files.de> 2011-11-11 17:33:38 UTC --- Created an attachment (id=461767) --> (http://bugzilla.novell.com/attachment.cgi?id=461767) Image of sda1 (=/boot) after kernel update and grub failure (Part 4/7) Part 4/7 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c7 --- Comment #7 from Yarny Yarny <Yarny@public-files.de> 2011-11-11 17:36:29 UTC --- Created an attachment (id=461768) --> (http://bugzilla.novell.com/attachment.cgi?id=461768) Image of sda1 (=/boot) after kernel update and grub failure (Part 5/7) Part 5/7 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c8 --- Comment #8 from Yarny Yarny <Yarny@public-files.de> 2011-11-11 17:39:06 UTC --- Created an attachment (id=461769) --> (http://bugzilla.novell.com/attachment.cgi?id=461769) Image of sda1 (=/boot) after kernel update and grub failure (Part 6/7) Part 6/7 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c9 Yarny Yarny <Yarny@public-files.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|Yarny@public-files.de | --- Comment #9 from Yarny Yarny <Yarny@public-files.de> 2011-11-11 17:42:46 UTC --- Created an attachment (id=461771) --> (http://bugzilla.novell.com/attachment.cgi?id=461771) Image of sda1 (=/boot) after kernel update and grub failure (Part 7/7) Part 7/7 (see Comment 3 for details) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c10 --- Comment #10 from Yarny Yarny <Yarny@public-files.de> 2012-02-04 18:12:40 UTC --- Created an attachment (id=474450) --> (http://bugzilla.novell.com/attachment.cgi?id=474450) Script that installs openSUSE and triggers bug I think I can now reproduce the bug at will, with these steps: * Create a new VirtualBox with two CD/DVD drives and a 48GB hard disk. I never tried this on a real machine since it deletes anything on the hard disk. * Boot the 11.4 LiveCD, insert the 11.4 setup DVD in the second drive, and mount in /mnt/susedvd . * Run the attached script * Shutdown. Tell VirtualBox to boot from HD. Start the VBox again. * Wait The script I attached here creates two partitions on the hard disk: One for /boot (~30MB) and one, which contains an lvm container with one logical volume, that will be root. This is mounted somewhere and "zypper -R" installs a minimal SUSE installation. It writes /etc/fstab, installs grub into (hd0) and (hd0,0), and installs a script into /etc/init.d which will constantly 1) Rewrite /boot/initrd-* files and 2) reboot the machine. After about 4 to 40 such reboots, grub says either * Error 16: Inconsistent filesystem structure or * Error 24: Attempt to access block outside partition During my experimentations, I made these observations: * The procedure is not 100% reliable. Sometimes I do exactly as stated above, but after 60 or 70 reboots, I give up. * This bug is independent of architecture; I'm currently using 64 bit SUSE for this, but I also observed on 32bit machines (see my original posting). * Up to now I failed to reproduce this on openSUSE 12.1 (64 bit). Yarny -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c11 Yarny Yarny <Yarny@public-files.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Platform|i686 |All --- Comment #11 from Yarny Yarny <Yarny@public-files.de> 2012-02-04 18:14:24 UTC --- I'm setting Platform to "All" -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c12 Torsten Duwe <duwe@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #12 from Torsten Duwe <duwe@suse.com> 2012-09-05 16:06:41 UTC --- I found the time now to reproduce this (in qemu-kvm). Strange. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c13 --- Comment #13 from Yarny Yarny <Yarny@public-files.de> 2012-09-05 18:23:38 UTC ---
I found the time now to reproduce this Did you use oS 11.4 or 12.1 for this? I couldn't reproduce it with oS 12.1. Maybe it got fixed somehow in 12.1?
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c14 --- Comment #14 from Torsten Duwe <duwe@suse.com> 2012-09-06 10:01:24 UTC --- The bug is reproducible entirely within the partition image you provided. For the record: cat bug-729667_bacoc.sda1.xz.part0* | xz -d > bacoc.sda1 ( xz -dc MBR+part_off.xz ; cat bacoc.sda1 ) > HDimg qemu-kvm -m 512 -snapshot -hda HDimg -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c15 --- Comment #15 from Torsten Duwe <duwe@suse.com> 2012-09-06 10:04:43 UTC --- Created an attachment (id=504672) --> (http://bugzilla.novell.com/attachment.cgi?id=504672) prefix that provides MBR and shifts partition to proper offset For completeness, the preamble I created. MBR/msdos or GPT does not seem to matter in this case. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c16 --- Comment #16 from Torsten Duwe <duwe@suse.com> 2012-09-06 10:06:10 UTC --- I can not think of any code change that might have fixed this issue for 12.2. This is very likely nondeterministic. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c17 --- Comment #17 from Yarny Yarny <Yarny@public-files.de> 2012-09-08 10:57:09 UTC ---
The bug is reproducible entirely within the partition image [..] Ah, I thought you used the script of Comment #10.
I applied fsck.ext2 of openSUSE 12.1 and 12.2 (last RC) to the disk image and they didn't find any problems. So it seems to be a grub problem. I attached the disk image (plus MBR) to an openSUSE 12.1 machine as /dev/sdb and modified the menu.lst on /dev/sda to boot from the partition on /dev/sdb1=(hd1,0). The grub of openSUSE 12.1 had no problems loading the kernel and initrd of the bacoc partition. My humble interpretation is that there has been a bug in grub's ext2 filesystem parsing module in openSUSE 11.4, but it got fixed in openSUSE 12.1's grub. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c18 --- Comment #18 from Yarny Yarny <Yarny@public-files.de> 2013-09-22 17:58:38 UTC --- This bug report is now inactive for more than a year. I'm haven't seen the bug since I moved to openSUSE 12.1. openSUSE 11.4 is no longer maintained. Should we close this report (as FIXED/WONTFIX/NORESPONSE)? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c19 Olaf Hering <ohering@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ohering@suse.com --- Comment #19 from Olaf Hering <ohering@suse.com> 2013-11-13 10:11:20 CET --- (In reply to comment #18)
This bug report is now inactive for more than a year. I'm haven't seen the bug since I moved to openSUSE 12.1. openSUSE 11.4 is no longer maintained.
Should we close this report (as FIXED/WONTFIX/NORESPONSE)?
I still see this with Factory grub. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c20 --- Comment #20 from Olaf Hering <ohering@suse.com> 2013-11-13 14:51:33 CET --- stage2/fsys_ext2fs.c: 652 if ((ex->ee_block + ex->ee_len) < logical_block) This fails for me, ee_block is 0, ee_len is 0x11 and logical_block is 0x12 In theory it should be possible to debug this in the running system with grub (I ran into the above as well there), but for me it segfaults when processing cmdline, and later memcheck fails becausembi.mem_upper/lower are not handled properly. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=729667 https://bugzilla.novell.com/show_bug.cgi?id=729667#c21 --- Comment #21 from Olaf Hering <ohering@suse.com> 2013-11-13 15:11:10 CET --- It turned out that once more I ran into the "grub does not handle sparse files" issue. I already knew it does not handle sparse /boot/grub/* files, and and unsparsified them. But it also does not handle sparse /boot/* files. Now that I also fixed this the booting works again. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com