[Bug 1011529] New: YaST2 hung ofpathname "too many arguments" for ppc64le multipath configuration
http://bugzilla.suse.com/show_bug.cgi?id=1011529 Bug ID: 1011529 Summary: YaST2 hung ofpathname "too many arguments" for ppc64le multipath configuration Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: PowerPC-64 OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Installation Assignee: yast2-maintainers@suse.de Reporter: normand@linux.vnet.ibm.com QA Contact: jsrain@suse.com Found By: --- Blocker: --- YaST2 hung on Saving bootloader configuration for multipath ppc64le guest * Create a ppc64le guest with qemu parameters of (1) with a disk accessed by two paths. (to mimic the configuration set for multipath test in openQA (2)) * YaST2 hung on Saving bootloader configuration for multipath ppc64le guest * by ssh access able to retrieve the list of hang process (3) the strace of ofpathname shows an infinit loop (4) and the y2log is full of "too many argument" error (5) (1) === $qemu-img create raid/l1 -f raw 10G $qemu-system-ppc64 -vga std -m 4096 -machine usb=off -cpu host -nographic -netdev user,id=qanet0,hostfwd=::10022-:22,hostname=qemu2 -device virtio-net,netdev=qanet0 -device virtio-scsi-pci,id=scsi0 -device virtio-scsi-pci,id=scsi1 -device scsi-hd,drive=hd1a,bus=scsi0.0 -drive file=raid/l1,cache=none,if=none,id=hd1a,serial=mpath1,format=raw -device scsi-hd,drive=hd1b,bus=scsi1.0 -drive file=raid/l1,cache=none,if=none,id=hd1b,serial=mpath1,format=raw -drive media=cdrom,if=none,id=cd0,format=raw,file=/home/michel/raid/openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20161115-Media.iso -device scsi-cd,drive=cd0,bus=scsi0.0 -boot once=d,menu=on,splash-time=5000 -smp 8,threads=8 -enable-kvm -no-shutdown -monitor stdio -serial pty -S -append 'linemode=1 linuxrc.log=/var/log/YaST2/linuxrc.log linuxrc.debug=1 startshell=1 insecure=1 UseSSH=1 SSHPassword=root' -kernel /home/michel/raid/linux -initrd /home/michel/raid/initrd === (2) https://openqa.opensuse.org/tests/305788#step/start_install/15 (3) === $ps axf ... 15729 pts/1 S+ 0:00 \_ /usr/bin/perl /usr/lib/YaST2/servers_non_y2/ag_uid 16212 pts/1 S+ 0:00 \_ /usr/sbin/grub2-install --target=powerpc-ieee1275 --force --skip-fs-probe /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part1 16231 pts/1 S+ 0:00 \_ /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1 16256 pts/1 S+ 0:12 \_ /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1 === (4) === # strace -p 16256 strace: Process 16256 attached read(3, "sda\nsdb\n", 128) = 8 read(3, "", 128) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15267, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 15267 waitpid(-1, 0x3ffff795e9d4, WNOHANG) = -1 ECHILD (No child processes) rt_sigreturn() = 0 close(3) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGINT, {0x10025e68, [], 0}, {SIG_IGN, [], 0}, 8) = 0 rt_sigaction(SIGINT, {SIG_IGN, [], 0}, {0x10025e68, [], 0}, 8) = 0 rt_sigaction(SIGINT, {SIG_IGN, [], 0}, {SIG_IGN, [], 0}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 open("slaves/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 getdents(3, /* 4 entries */, 65536) = 96 getdents(3, /* 0 entries */, 65536) = 0 close(3) = 0 write(2, "/usr/sbin/ofpathname: line 412: "..., 55) = 55 pipe([3, 5]) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x3fff8da64800) = 15270 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigaction(SIGCHLD, {0x1008b670, [], SA_RESTART}, {0x1008b670, [], SA_RESTART}, 8) = 0 close(5) = 0 read(3, "sda\nsdb\n", 128) = 8 read(3, "", 128) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15270, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- ... === (5) === $tail -n3 /var/log/YaST2/y2log 2016-11-21 19:02:03 <3> linux-9de6(4188) [Ruby] lib/cheetah.rb:206 Error output: /usr/sbin/ofpathname: line 412: cd: too many arguments 2016-11-21 19:02:03 <3> linux-9de6(4188) [Ruby] lib/cheetah.rb:206 Error output: /usr/sbin/ofpathname: line 412: cd: too many arguments 2016-11-21 19:02:03 <3> linux-9de6(4188) [Ruby] lib/cheetah.rb:206 Error output: /usr/sbin/ofpathname: line 412: cd: too many arguments === -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c1
--- Comment #1 from Michel Normand
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c2
--- Comment #2 from Michel Normand
http://bugzilla.suse.com/show_bug.cgi?id=1011529
Michel Normand
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c3
--- Comment #3 from Michel Normand
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c4
--- Comment #4 from Michel Normand
[CUT]
I would need help to continue investigation.
if I try to chroot to /mnt and kill hung ofpathname then parent is making cleanup that umount, so unable to manually call ofpathname to better understand original hung failure. === 2:linux-ac6f:~ # chroot /mnt linux-ac6f:/ # ps axf ... 17473 pts/0 S+ 0:00 | \_ /usr/sbin/grub2-install --target=powerpc-ieee1275 --force --skip-fs-probe /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part1 17489 pts/0 S+ 0:00 | \_ /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1 17505 pts/0 S+ 1:38 | \_ /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1 ... linux-ac6f:/ # kill -9 17505 linux-ac6f:/ # ls /usr/sbin/ofpathname /usr/sbin/ofpathname linux-ac6f:/ # /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1 sed: can't read /proc/cpuinfo: No such file or directory grep: /proc/cpuinfo: No such file or directory grep: /proc/cpuinfo: No such file or directory grep: /proc/cpuinfo: No such file or directory /usr/bin/find: '/sys/class/net': No such file or directory ofpathname: Could not retrieve Open Firmware device path for logical device "/dev/mapper/0QEMU_QEMU_HARDDISK_mpath1". === -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c5
--- Comment #5 from Michel Normand
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c6
--- Comment #6 from Michel Normand
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c7
--- Comment #7 from Michel Normand
http://bugzilla.suse.com/show_bug.cgi?id=1011529
Michel Normand
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c8
--- Comment #8 from Michel Normand
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c9
Michel Normand
I have an RFC patch for ofpathname https://github.com/nfont/powerpc-utils/pull/14
but I do not know if this is the correct way to solve this problem.
I verified with a DUD file that the above patch is sufficient to avoid the Yast hung on "Saving bootloader configuration" -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c10
Michal Suchanek
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c11
--- Comment #11 from Michael Chang
Created attachment 703548 [details] grub2-install debug trace
I am wondering if the attached xx3.log debug trace would explain why grub2-install changed from input /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part1
It's udev device name (for a partition)
to /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
It's (kernel's) "canonical" device name (for a disk) The udev names are translated into "canonical" names under /dev/... and then processed internally for their name patterns being "known" to grub2. Otherwise you have to teach grub to know each and every different name patterns changing all the time with user-land tools applying different rules/policies .. The device mapper is probably the only exception to use names under/dev/mappper/... as it's more human readable than /dev/dm-[0-9]+ and also the pattern is understood.
when calling ofpathname script.
Why does the name matter here ? Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c12
Michael Chang
Maybe someone who is familiar with the grub scripts could tell if all the slaves are needed or picking one suffices?
Not really. It is for powerpc-utils of which I am not maintainer. Nevertheless It looks to me, for multipath, it probably suffice to use only one slave as all slaves only provides different routes to the same "device" so that data should well be identical. But I couldn't tell whether it's the only case to consider, for eg, where any disk fails or other device mapper device (for eg, dmraid or dmcrypt, though firmware may not support booting them directly thus may not be valid). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c13
--- Comment #13 from Michal Suchanek
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c14
--- Comment #14 from Michael Chang
'firmware' here is Linux most likely
Sorry but I am confused here. I don't really get your idea in such a short comment. (Please be more verbose to help me understand your thoughts :)) Any, In comment#12, the 'firmware' I was exactly inferring to Open Firmware, which can only understand IEEE1275 device tree path for the disk (containing PReP partition) translated by ofpathname. The ofpathname also takes care the Linux logical device to OFW path, and it have trouble (Michel Normand created RFC patch for it). If firmware can't really find a way to deal with the logical device, it has to report something like "LVM Disk /dev/system/root is not support by "direct" firmware booting." or such.
The problem is with ofpathname and turned up by a change to grub scripts that started calling it afaik.
Again I am confused by the 'grub script'. :( Here listed my candidates : 1. grub2-install, but it's nothing a script 2. scripts under /etc/grub.d/ but they have nothing to do with ofpathname (ie not calling it) 3. perl bootloader and/or YaST scripts, but they are not grub scripts 4. else .. I presume 'grub2-install' is what you were talking about, but it has been calling ofpathname since day one for booting powerpc-ieee1275. Btw, grub2 has also grub2_ofpathname but is not used here, is it cause of confusion here?
So what does the grub script expect to get if there are multiple ways to reach the disk (provided it's not a deficiency of ofpath and there are in fact multiple equally canonical of names of the disk)?
In this case, it boots the the disk from which firmware loads it (aka the boot disk). That is setting the $prefix to '(,msdos3)/boot/grub2' The msdos3 is set during grub2-install. As cross-disk installation (ie the PReP and /boot partition are on different disk) is not allowed in grub2, it will have identical result, even if you swap the disk order. You can see this line in attachment #3. grub-mkimage --directory '/usr/lib/grub2/powerpc-ieee1275' --prefix '(,msdos3)/boot/grub2' --output '/boot/grub2/powerpc-ieee1275/core.elf' --format 'powerpc-ieee1275' --compression 'auto' --config '/boot/grub2/powerpc-ieee1275/load.cfg' 'btrfs' 'part_msdos' Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1011529
Imobach Gonzalez Sosa
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c15
Michal Suchanek
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c16
Michal Suchanek
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c17
--- Comment #17 from Michel Normand
Assigning maintainer.
Please integrate patch from comment #8 or reassign to me.
it is already in OBS with SR https://build.opensuse.org/request/show/442438 https://build.opensuse.org/request/show/442808 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1011529
Michael Chang
http://bugzilla.suse.com/show_bug.cgi?id=1011529
http://bugzilla.suse.com/show_bug.cgi?id=1011529#c18
Michel Normand
participants (1)
-
bugzilla_noreply@novell.com