Hi,
Is your xen compute node running latest xen and libvirt, including the fix for bug#981094?
Yes it does: ---cut here--- compute1:~ # rpm -qi libvirt-1.3.4-573.1.x86_64 Name : libvirt Version : 1.3.4 Release : 573.1 Architecture: x86_64 Install Date: Do 26 Mai 2016 14:55:31 CEST Group : Development/Libraries/C and C++ Size : 106 License : LGPL-2.1+ Signature : RSA/SHA256, Mi 25 Mai 2016 19:31:20 CEST, Key ID a193fbb572174fc2 Source RPM : libvirt-1.3.4-573.1.src.rpm Build Date : Mi 25 Mai 2016 19:30:03 CEST Build Host : build78 Relocations : (not relocatable) Vendor : obs://build.opensuse.org/Virtualization URL : http://libvirt.org/ ############################################## compute1:~ # rpm -qi xen-libs Name : xen-libs Version : 4.7.0_03 Release : 440.1 Architecture: x86_64 Install Date: Di 10 Mai 2016 13:59:52 CEST Group : System/Kernel Size : 1560640 License : GPL-2.0 Signature : RSA/SHA256, Fr 06 Mai 2016 16:33:12 CEST, Key ID a193fbb572174fc2 Source RPM : xen-4.7.0_03-440.1.src.rpm Build Date : Fr 06 Mai 2016 16:31:47 CEST Build Host : build74 Relocations : (not relocatable) Vendor : obs://build.opensuse.org/Virtualization ---cut here--- Here are the logs from /var/log/libvirt/libxl/libxl-driver.log and /var/log/xen/bootloader.26.log ---cut here--- compute1:~ # tail /var/log/libvirt/libxl/libxl-driver.log 2016-06-02 10:32:59 CEST libxl: error: libxl_bootloader.c:635:bootloader_finished: bootloader failed - consult logfile /var/log/xen/bootloader.26.log 2016-06-02 10:32:59 CEST libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: bootloader [28303] exited with error status 1 2016-06-02 10:32:59 CEST libxl: error: libxl_create.c:1222:domcreate_rebuild_done: cannot (re-)build domain: -3 ############################################## compute1:~ # tail /var/log/xen/bootloader.26.log Traceback (most recent call last): File "/usr/lib/xen/bin/pygrub", line 923, in <module> part_offs = get_partition_offsets(file) File "/usr/lib/xen/bin/pygrub", line 114, in get_partition_offsets image_type = identify_disk_image(file) File "/usr/lib/xen/bin/pygrub", line 57, in identify_disk_image fd = os.open(file, os.O_RDONLY) OSError: [Errno 2] No such file or directory: 'rbd:images/551a1dd6-ce9b-44c9-87f4-c2058efd94f6_disk:id=openstack:key=<KEY>:auth_supported=cephx\\;none:mon_host=<HOST1>\\:6789\\;<HOST2>\\:6789\\;<HOST3>\\:6789' ---cut here--- The output from libxl-driver.log is always the first thing I check if an instance fails to boot, and if I read pygrub, I add that described patch to driver.py to use grub.xen. Regards, Eugen Zitat von Jim Fehlig <jfehlig@suse.com>:
Eugen Block wrote:
Hi Thomas,
thanks for the quick response! I ran some tests to verify your suggestion. Please notice that we are using Ceph as storage backend, that's why I had to run the tests for SLE11 twice.
Here is a summary of my tests:
cirrOS: w/o changes in driver.py: works w/ changes in driver.py: works w/ glance kernel-id: fails
SLE12: w/o changes in driver.py: fails w/ changes in driver.py: works w/ glance kernel-id: works
SLE11 (Image in RBD pool): w/o changes in driver.py: fails w/ changes in driver.py: works w/ glance kernel-id: works
SLE11 (Image in local fs): w/o changes in driver.py: works w/ changes in driver.py: works w/ glance kernel-id: works
The SLE12 image did not boot without my patch, I updated the image property with the grub.xen uuid as suggested, that worked quite well. I tried the same with a cirros image, it did not boot with the kernel-id property. It's just an image for testing purposes, but extremely helpful for quick tests, so I need it to work, too.
Then I tested a SLE11 image to see if older images are still working. Without modifications in driver.py or the glance image, the VM wouldn't boot. Is it possible that pygrub is not capable of working with network backed images?
pygrub should work with network-based block devices. Is your xen compute node running latest xen and libvirt, including the fix for bug#981094? Regardless, /var/log/libvirt/libxl/libxl-driver.log (and perhaps related logs in /var/log/xen/) should contain some hints as to what failed.
Because when I switched back to local file system, the VM boots without any problems, also without any changes. It worked both with a kernel-id and with a code change for the rbd backed image.
So in general, your suggestion works. But regarding the maintenance of a cloud, it seems not very handy. If I understand correctly, in case of an grub update I would have to upload a new grub image to glance, then replace the reference to that old kernel-id with the newly updloaded kernel-id. That doesn't sound very useful, to be honest.
Although the code change is very practical for me, I have to admit that it's not a general solution as it doesn't take into account any system relevant information besides the virt_type. But let's say you would build a real patch that contains some information on how to choose the correct bootloader, wouldn't that be more practical?
Another question comes to my mind: is there any reason to stick with pygrub instead of using grub.xen?
IMO, grub.xen should always be used for PV instances in a cloud environment. pygrub mounts the image in the compute node (dom0) to extract the kernel/initrd, which unnecessarily exposes it to potential vulnerabilities due to a rouge image.
Regards, Jim -- To unsubscribe, e-mail: opensuse-cloud+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-cloud+owner@opensuse.org
-- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg e-mail : eblock@nde.ag Vorsitzende des Aufsichtsrates: Angelika Mozdzen Sitz und Registergericht: Hamburg, HRB 90934 Vorstand: Jens-U. Mozdzen USt-IdNr. DE 814 013 983 -- To unsubscribe, e-mail: opensuse-cloud+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-cloud+owner@opensuse.org