[Bug 803078] New: grub2-install on USB thumbdrive never returns, uses 100% CPU
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c0 Summary: grub2-install on USB thumbdrive never returns, uses 100% CPU Classification: openSUSE Product: openSUSE 12.2 Version: Final Platform: i686 OS/Version: openSUSE 12.2 Status: NEW Severity: Normal Priority: P5 - None Component: Bootloader AssignedTo: jsrain@suse.com ReportedBy: jdelvare@suse.com QAContact: jsrain@suse.com Found By: Community User Blocker: --- I have installed openSUSE 12.2 on a USB thumbdrive. It uses grub2 as the bootloaded, which by default was installed on the only partition. As a maintenance update once prevented the drive to be booted until I manually ran grub2-install again, I was advised to install grub2 on the MBR instead. So I tried: # grub2-install --root-directory=/mnt/usb /dev/sdb However this command never returns, and grub2-bios-setup uses 100% CPU. I'll attach the debug log I gathered. I installed debuginfo packages and got the following backtrace when attaching gdb to the stuck grub2-bios-setup process: (gdb) bt #0 0x080a77cd in grub_strncmp () #1 0x08068c4d in grub_iso9660_susp_iterate () #2 0x080690ce in set_rockridge () #3 0x08069288 in grub_iso9660_mount () #4 0x0806a0d7 in grub_iso9660_dir () #5 0x080a6cac in grub_fs_probe () #6 0x0804a9ae in setup () #7 0x0804c730 in main () (gdb) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c1 --- Comment #1 from Jean Delvare <jdelvare@suse.com> 2013-02-11 13:49:21 UTC --- Created an attachment (id=524117) --> (http://bugzilla.novell.com/attachment.cgi?id=524117) Debug log for grub2-install Gathered with: # grub2-install --debug --root-directory=/mnt/usb /dev/sdb > install.log 2>&1 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c Jiri Srain <jsrain@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|jsrain@suse.com |mchang@suse.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c2 --- Comment #2 from Michael Chang <mchang@suse.com> 2013-02-18 04:13:39 UTC --- Jean, Sorry for late respond, just back from Chinese new year vacation. Thanks to posting the detailed analyze & log, are you using hybrid image? A hint from it could be grub2 confused /dev/sdb as ISO images (and trying an unusual setup on it). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c3 --- Comment #3 from Michael Chang <mchang@suse.com> 2013-02-18 07:21:42 UTC --- Jean, Sorry, my comment above was irrelevant. Just to be sure about that grub_fs_probe is problematic, could you please check grub2-probe has the same symptom? $ grub2-probe --device-map=/mnt/usb/boot/grub2/device.map --verbose -t fs -d /dev/sdb There's a for() loop in grub_iso9660_susp_iterate, and could be spinning there but not understanding why. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c4 --- Comment #4 from Michael Chang <mchang@suse.com> 2013-02-18 07:34:08 UTC --- An interesting finding on grub_iso9660_mount, it shouldn't reach set_rockridge() at all as superblock check should end probing it .. if (grub_disk_read (disk, block << GRUB_ISO9660_LOG2_BLKSZ, 0, sizeof (struct grub_iso9660_primary_voldesc), (char *) &voldesc)) { grub_error (GRUB_ERR_BAD_FS, "not a ISO9660 filesystem"); goto fail; } if (grub_strncmp ((char *) voldesc.voldesc.magic, "CD001", 5) != 0) { grub_error (GRUB_ERR_BAD_FS, "not a ISO9660 filesystem"); goto fail; } Do you have any cd image mounted on host that grub device (hd0) might point to (not sure it's caused by messing device.map in usb stick and host ..) :(? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c5 --- Comment #5 from Jean Delvare <jdelvare@suse.com> 2013-02-18 08:26:31 UTC --- No problem. I'm happy that you're working on this bug but I am in no hurry, so take your time! The system is a regular installation of an openSUSE 12.2 system, so I don't think it is an hybrid image. I can provide a binary dump of particular areas of the drive if it helps, just tell me what you need. There is nothing loop-mounted on the machine, ISO image or other. device.map on the host looks sane: (hd0) /dev/disk/by-id/ata-ST3300622A_5NF2HGZ4 And on the stick as well: (hd0) /dev/disk/by-id/usb-Kingston_DataTraveler_G3_001478544887FB91C7BA23D8-0:0 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c6 --- Comment #6 from Jean Delvare <jdelvare@suse.com> 2013-02-18 08:55:44 UTC --- Doh. The drive _is_ an hybrid. As I was mounting it to perform the test you asked for, I mistyped the command and mounted /dev/sdb instead of /dev/sdb1. It should have failed, but worked: /dev/sdb on /mnt/usb type iso9660 (ro,relatime) This must be related to the history of this particular USB thumbdrive. Originally I had dumped the openSUSE 12.2 DVD on it. Then I found that changes done to live instances were no longer persistent, and went for a regular installation instead. Apparently some data from the previous installation is still present on the media. I can read some of the text files from the iso9660 mount, while others are corrupted (as expected, given that /dev/sdb1 occupies the whole space and holds the new installation.) This definitely explains why grub2-install goes into iso9660-related code at all. Partition Table for /dev/sdb First Last # Type Sector Sector Offset Length Filesystem Type (ID) Flag -- ------- ----------- ----------- ------ ----------- -------------------- ---- Pri/Log 0 2047* 0# 2048*Free Space None 1 Primary 2048* 15644671* 0 15642624*Linux (83) Boot Pri/Log 15644672* 15644911* 0 240*Free Space None The drive was partitioned that way by the openSUSE 12.2 installation system. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c7 Jean Delvare <jdelvare@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #7 from Jean Delvare <jdelvare@suse.com> 2013-02-18 09:01:22 UTC --- I performed the test you asked for in comment #3: # grub2-probe --device-map=/mnt/usb/boot/grub2/device.map --verbose -t fs -d /dev/sdb grub2-probe : information : Looking for /dev/sdb. grub2-probe : information : /dev/sdb is a parent of /dev/sdb. grub2-probe : information : Looking for /dev/sdb. grub2-probe : information : /dev/sdb is a parent of /dev/sdb. grub2-probe : information : Looking for /dev/sdb. grub2-probe : information : /dev/sdb is a parent of /dev/sdb. grub2-probe : information : opening hd0. grub2-probe : information : the size of hd0 is 15644912. And it stays there forever. The stack trace is the same as before: (gdb) bt #0 0x080685f7 in susp_iterate.2371 () #1 0x08068483 in grub_iso9660_susp_iterate () #2 0x0806889a in set_rockridge () #3 0x08068a54 in grub_iso9660_mount () #4 0x080698a3 in grub_iso9660_dir () #5 0x080e2b34 in grub_fs_probe () #6 0x0804af42 in probe () #7 0x0804bf75 in main () -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c8 --- Comment #8 from Jean Delvare <jdelvare@suse.com> 2013-02-18 09:09:09 UTC --- At this point I suppose it can be argued that the problem if the original format of my USB thumbdrive and software isn't to blame. However, I still believe the situation is undesirable and could happen to others, as I did not do anything out of the ordinary to get there. I am fine with grub-install failing in this case, but then with a proper error message, not an endless loop. It might also make sense for the openSUSE installation system to fix the drive if it doesn't look good? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c9 --- Comment #9 from Jean Delvare <jdelvare@suse.com> 2013-02-18 09:11:02 UTC --- Michael, I suppose the fix for me now would be to run fixmbr on the USB thumdrive. However I can wait until we are done with this bug, as this will break the only test case we have. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c10 Michael Chang <mchang@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |snwint@suse.com --- Comment #10 from Michael Chang <mchang@suse.com> 2013-02-25 04:30:31 UTC --- Jean, Sorry I was quite busy last week and somehow latest (two) comment not seen in my mailbox. :( I agree, grub2 iso9660 can possibly be improved with sanity check to avoid such problem to happen, but how? And even kernel is deceived and counts it as valid file-system. I'll try anyway but I'm not good at those formats but it's interesting to look at. I also agree with you that openSUSE installation should fix the drive, CC Steffen here and see if he has any other good idea ? Btw, is fixmbr work for you ? (I suppose not ..). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c11 --- Comment #11 from Michael Chang <mchang@suse.com> 2013-02-25 07:26:17 UTC --- I cannot reproduce the 100% .. my step is $ dd if=/dev/zero of=./test.raw bs=1M count=8192 $ losetup -f ./test.raw (assume /dev/loop0) $ cat openSUSE-12.3-GNOME-LiveCD-Build0024-x86_64.iso > /dev/loop0 $ losetup -d /dev/loop0 $ qemu-kvm -m 1024 -boot once=d -cdrom openSUSE-12.3-GNOME-LiveCD-Build0024-x86_64.iso -hda test.raw -net bridge,br=br1 -net nic,model=virtio,macaddr=00:1A:4B:91:79:85 Finish the livecd install 12.3 (Fully follow the default suggestion) to test.raw. Boot into livecd again, run $grub2-probe -t fs -d /dev/sda iso9660 Expecting it to hang, but not happened .. :( -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c13 --- Comment #13 from Steffen Winterfeldt <snwint@suse.com> 2013-02-25 10:10:27 CET --- This bug shows actually a valid problem. :-/ The difference between 12.3 and 12.2 (at least with latest builds) is that we have a sane partition table in 12.3. In 12.2 the partition points to nowhere (and doesn't have a recognizable fs in it). So, if the strategy is to probe the entire block dev only if there are no valid data in any partitions, we are fine in 12.3. If we probe the entire device anyway, it's pure luck. The new fashion with partitioning tools is to leave the 1st MB empty. So any meta data from an hybrid-iso is not (entirely) destroyed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c14 --- Comment #14 from Jean Delvare <jdelvare@suse.com> 2013-02-25 09:25:44 UTC --- 12.2 vs 12.3 is probably relevant here. My real USB thumbdrive uses 12.2 and has the problem while Michael's attempt to reproduce was with 12.3 and he doesn't get the problem. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c16 --- Comment #16 from Jean Delvare <jdelvare@suse.com> 2013-02-25 13:22:11 UTC --- (In reply to comment #10)
Btw, is fixmbr work for you ? (I suppose not ..).
You are right, I just tried it and it did not help. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c18 --- Comment #18 from Michael Chang <mchang@suse.com> 2013-02-27 06:53:58 UTC --- Some findings, 1. It has things to do with from where your first partition starts, I found that Jean's failure disk started quite behind (2048 sector) compared to mine (64 sector). The iso9660 reserved first 32k bytes (ie metadata start from 64 sector) so in Jean's disk there are lot of leftover iso9660 artifacts (between 64 - 2048 sectors). 2. When this happens, it's pure luck indeed. Grub2's iso9660 hangs at 100% when it was trying to parse rockridge extension and unfortunately trapped by infinite loop (somehow, maybe cause by overwritten meta data blocks and poor sanity check). Ordinary mount and kernel iso9660 had no such issue probably it's more solid. [1] 3. May have things to do with parted as it decides start of first partition, probably "Use the entire disk" scenario and parted repartitioned your first partition ? I have no idea about factors how parted picks the value, probably cylinder boundaries, performance or practice? [1] compare 'mount' and 'grub2-mount', grub2-mount hangs by 100% cpu but mount is fine. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c19 --- Comment #19 from Michael Chang <mchang@suse.com> 2013-02-27 09:34:49 UTC --- Created an attachment (id=527155) --> (http://bugzilla.novell.com/attachment.cgi?id=527155) stop iterate when bogus zero length System Usage entry met The cpu 100% problem seems to be fixed by above patch, it's caused by looping in non-incremented entry .. Testing package is at: http://download.opensuse.org/repositories/home:/michael-chang:/bnc803078/ope... http://download.opensuse.org/repositories/home:/michael-chang:/bnc803078/ope... use grub2-mount you can mount /dev/sda and see it's contents. However grub2-install still cannot work because iso9660 fs doesn't provide any embedded area for bootloader (which is expected) .. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c20 --- Comment #20 from Steffen Winterfeldt <snwint@suse.com> 2013-02-27 11:25:31 CET ---
However grub2-install still cannot work because iso9660 fs doesn't provide any embedded area for bootloader (which is expected) ..
Actually iso9660 has 32k empty space at the fs start. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c21 --- Comment #21 from Jean Delvare <jdelvare@suse.com> 2013-02-27 13:29:49 UTC --- I did indeed repartition the drive manually as I didn't want swap nor /home on a separate partition. But the start at 2048 has nothing to do with this, it is simply caused by partitions being aligned on 1 MB boundaries. Microsoft does that since Windows 7 (or maybe even Vista) and we decided to do the same for compatibility. After installing your updated grub2 package, I can confirm that grub2-mount /dev/sdb no longer hangs, it now works and the mount point contains the same ghost tree as when mounting /dev/sdb the regular way. This is the ghost tree from the hybrid image that was once written to the USB thumbdrive, but no longer reflects the reality. The new grub2-install fails with the following strange error message: rm: cannot remove '/mnt/usb/boot/grub2/i386-pc/acpi.mod': Function not implemented At this point my USB thumbdrive no longer boots. It displays "GRUB" with a blinking cursor and nothing happens. Don't worry too much about this, I can restore it from the snapshot I sent to you if needed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c22 --- Comment #22 from Jean Delvare <jdelvare@suse.com> 2013-02-27 13:32:12 UTC --- If I want to clean my drive to prevent grub2 confusion, I suppose I should simply zero sectors 1 to 2047? Or will I destroy something useful by doing that? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c23 --- Comment #23 from Steffen Winterfeldt <snwint@suse.com> 2013-02-27 16:00:37 CET --- Yes, if there's a gpt. I would just clear 1 block at 32k to remove the iso9660 magic. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c24 --- Comment #24 from Jean Delvare <jdelvare@suse.com> 2013-02-27 17:24:21 UTC --- I don't think there's a GPT on that drive, a dump of the MBR shows a single partition with type 0x83, while I seem to understand type would be 0xEE if there was a GPT. But I'm happy killing only the first block of the iso9660 data if that's enough, thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c25 --- Comment #25 from Jean Delvare <jdelvare@suse.com> 2013-03-01 13:19:43 UTC --- I cleared 3 sectors remaining from the ISO9660 image, as advised by Steffen: # dd if=/dev/zero seek=64 count=3 of=/dev/sdb After this, grub2-installed worked just fine, both Michael's version and the original version from openSUSE 12.2. I believe the Suse partitioning tool should do exactly that, to prevent the trouble I went through: if installing to a USB device and beginning of the first partition is beyond 64th sector, clear 64th sector (and possibly 65th and 66th as I did.) Or something like that - I'll let the experts decide of the details. Not that this did not only fix grub2-install in my case, it also altered the label displayed when the disk is being mounted. Beforehand, the old DVD label was being displayed, which was quite confusing. My problem is solved, so feel free to close this bug when you're finished working on it. Thanks a lot Michael and Steffen for explaining all the details and driving me to a solution. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c26 Michael Chang <mchang@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED --- Comment #26 from Michael Chang <mchang@suse.com> 2013-03-04 09:16:37 UTC --- (In reply to comment #25)
My problem is solved, so feel free to close this bug when you're finished working on it. Thanks a lot Michael and Steffen for explaining all the details and driving me to a solution.
Hi Jean, Thanks for valuable time and feedback. I think it's good to me to set status to resolved fixed as well. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=803078 https://bugzilla.novell.com/show_bug.cgi?id=803078#c27 --- Comment #27 from Michael Chang <mchang@suse.com> 2013-03-04 09:29:55 UTC --- (In reply to comment #20)
However grub2-install still cannot work because iso9660 fs doesn't provide any embedded area for bootloader (which is expected) ..
Actually iso9660 has 32k empty space at the fs start.
Yes you're right. I simply missed it. The problem now is grub2 as to why not offer any embed callback for iso9660 file system, like btrfs and zfs. To me it could be that iso9660 was designed on read-only optical media as it's mainstream usage (The hybrid iso could be an exotic product of isolinux project.) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com