[opensuse-cloud] Nova imports flat images from base file despite ceph backend
Hi list, this week I noticed something strange in our cloud (Ocata on Leap 42.3). We use Ceph as backend for nova, glance and cinder, everything really works like a charm. But from time to time we've noticed that some instances take much longer to launch than others. So I wanted to take a look what's happening, turned on debug logs and found that in some cases (I have no idea how to reproduce yet) there is an image downloaded to /var/lib/nova/instances/_base which then is used to import it back to Ceph, that is obviously the reason for the delay. The problem is that this new instance is not CoW, it's a flat rbd image. Here are some relevant logs (instance_id: 65567fc1-017f-45dc-b0ee-570c44146119, image_id: 0da1ba0f-c504-45ea-b138-16026aec022b) ---cut here--- [...] 2018-10-04 11:46:39.189 10293 DEBUG nova.compute.manager [req-85d728c3-5da1-4b37-add7-b956b4b2bb3d df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] [instance: 65567fc1-017f-45dc-b0ee-570c44146119] Start spawning the instance on the hypervisor. _build_and_run_instance /usr/lib/python2.7/site-packages/nova/compute/manager.py:1929 [...] 2018-10-04 11:46:39.192 10293 INFO nova.virt.libvirt.driver [req-85d728c3-5da1-4b37-add7-b956b4b2bb3d df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] [instance: 65567fc1-017f-45dc-b0ee-570c44146119] Creating image 2018-10-04 11:46:39.220 10293 DEBUG nova.virt.libvirt.storage.rbd_utils [req-85d728c3-5da1-4b37-add7-b956b4b2bb3d df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] rbd image 65567fc1-017f-45dc-b0ee-570c44146119_disk does not exist __init__ /usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py:77 2018-10-04 11:46:39.241 10293 DEBUG nova.virt.libvirt.storage.rbd_utils [req-85d728c3-5da1-4b37-add7-b956b4b2bb3d df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] rbd image 65567fc1-017f-45dc-b0ee-570c44146119_disk does not exist __init__ /usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py:77 2018-10-04 11:46:39.245 10293 DEBUG oslo_concurrency.lockutils [req-85d728c3-5da1-4b37-add7-b956b4b2bb3d df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] Lock "3c60237fbf59101d3411c4f795d0a72b82752e0b" acquired by "nova.virt.libvirt.imagebackend.fetch_func_sync" :: waited 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270 2018-10-04 11:46:39.246 10293 DEBUG oslo_concurrency.lockutils [req-85d728c3-5da1-4b37-add7-b956b4b2bb3d df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] Lock "3c60237fbf59101d3411c4f795d0a72b82752e0b" released by "nova.virt.libvirt.imagebackend.fetch_func_sync" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282 2018-10-04 11:46:39.266 10293 DEBUG nova.virt.libvirt.storage.rbd_utils [req-85d728c3-5da1-4b37-add7-b956b4b2bb3d df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] rbd image 65567fc1-017f-45dc-b0ee-570c44146119_disk does not exist __init__ /usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py:77 2018-10-04 11:46:39.269 10293 DEBUG oslo_concurrency.processutils [req-85d728c3-5da1-4b37-add7-b956b4b2bb3d df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] Running cmd (subprocess): rbd import --pool images /var/lib/nova/instances/_base/3c60237fbf59101d3411c4f795d0a72b82752e0b 65567fc1-017f-45dc-b0ee-570c44146119_disk --image-format=2 --id openstack --conf /etc/ceph/ceph.conf execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:355 [...] # no parent data control:~ # rbd info images/65567fc1-017f-45dc-b0ee-570c44146119_disk rbd image '65567fc1-017f-45dc-b0ee-570c44146119_disk': size 6144 MB in 1536 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.c8f88a6b8b4567 format: 2 features: layering, exclusive-lock, object-map flags: create_timestamp: Thu Oct 4 11:46:39 2018 ---cut here--- ################################################################### For comparison I moved the respective base file and tried to spawn a new instance from the same glance image: ---cut here--- [...] 2018-10-04 10:30:29.412 2336 DEBUG nova.compute.manager [req-942ba103-1932-4adf-b9b2-670e1a2fc126 df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] [instance: 91d0b930-97b0-4dd0-81b4-929599b7c997] Start spawning the instance on the hypervisor. _build_and_run_instance /usr/lib/python2.7/site-packages/nova/compute/manager.py:1929 [...] 2018-10-04 10:30:29.415 2336 INFO nova.virt.libvirt.driver [req-942ba103-1932-4adf-b9b2-670e1a2fc126 df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] [instance: 91d0b930-97b0-4dd0-81b4-929599b7c997] Creating image 2018-10-04 10:30:29.435 2336 DEBUG nova.virt.libvirt.storage.rbd_utils [req-942ba103-1932-4adf-b9b2-670e1a2fc126 df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] rbd image 91d0b930-97b0-4dd0-81b4-929599b7c997_disk does not exist __init__ /usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py:77 2018-10-04 10:30:29.455 2336 DEBUG nova.virt.libvirt.storage.rbd_utils [req-942ba103-1932-4adf-b9b2-670e1a2fc126 df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] rbd image 91d0b930-97b0-4dd0-81b4-929599b7c997_disk does not exist __init__ /usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py:77 2018-10-04 10:30:29.492 2336 DEBUG oslo_concurrency.lockutils [req-942ba103-1932-4adf-b9b2-670e1a2fc126 df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] Lock "3c60237fbf59101d3411c4f795d0a72b82752e0b" acquired by "nova.virt.libvirt.imagebackend.fetch_func_sync" :: waited 0.035s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270 2018-10-04 10:30:29.527 2336 DEBUG nova.virt.libvirt.imagebackend [req-942ba103-1932-4adf-b9b2-670e1a2fc126 df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] Image locations are: [{u'url': u'rbd://655cb05a-435a-41ba-83d9-8549f7c36167/images/0da1ba0f-c504-45ea-b138-16026aec022b/snap', u'metadata': {}}, {'url': u'rbd://655cb05a-435a-41ba-83d9-8549f7c36167/images/0da1ba0f-c504-45ea-b138-16026aec022b/snap', 'metadata': {}}] clone /usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py:903 2018-10-04 10:30:29.681 2336 DEBUG nova.virt.libvirt.imagebackend [req-942ba103-1932-4adf-b9b2-670e1a2fc126 df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] Selected location: {u'url': u'rbd://655cb05a-435a-41ba-83d9-8549f7c36167/images/0da1ba0f-c504-45ea-b138-16026aec022b/snap', u'metadata': {}} clone /usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py:912 2018-10-04 10:30:29.682 2336 DEBUG nova.virt.libvirt.storage.rbd_utils [req-942ba103-1932-4adf-b9b2-670e1a2fc126 df7b63e69da3b1ee2be3d79342e7992f3620beddbdac7768dcb738105e74301e 2e3c3f3822124a3fa9fd905164f519ae - - -] cloning images/0da1ba0f-c504-45ea-b138-16026aec022b@snap to None/91d0b930-97b0-4dd0-81b4-929599b7c997_disk clone /usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py:236 [...] # instance has parent data control:~ # rbd info images/91d0b930-97b0-4dd0-81b4-929599b7c997_disk rbd image '91d0b930-97b0-4dd0-81b4-929599b7c997_disk': size 8192 MB in 1024 objects order 23 (8192 kB objects) block_name_prefix: rbd_data.c8abcb6b8b4567 format: 2 features: layering, exclusive-lock, object-map flags: create_timestamp: Thu Oct 4 10:30:29 2018 parent: images/0da1ba0f-c504-45ea-b138-16026aec022b@snap overlap: 6144 MB ---cut here--- Why is there a local copy of the image in the first place? Obviously nova doesn't need that and creates clones successfully without them. The directory /var/lib/nova/instances/_base only contains 6 images. I would be thankful for any explanation for this behaviour. Regards, Eugen
On 04/10/2018 14.51, Eugen Block wrote:
So I wanted to take a look what's happening, turned on debug logs and found that in some cases (I have no idea how to reproduce yet) there is an image downloaded to /var/lib/nova/instances/_base which then is used to import it back to Ceph, that is obviously the reason for the delay.
You could try to import a new image into glance and see if its first boot is slow. This is what you get with local storage on compute nodes, too (per node) In that case, it could be a problem with the integration between the glance, cinder? and nova ceph backends. -- To unsubscribe, e-mail: opensuse-cloud+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-cloud+owner@opensuse.org
Hi Bernhard, thanks for your quick response. I did both, upload the same image (its origin is a volume) to glance and boot another instance, it got cloned as expected. Now I uploaded a different image (from a local file) to glance and launched an instances from that, once again everything as expected, and no new base image in /var/lib/nova/instances/_base. I have no idea how to reproduce this yet. Regards, Eugen Zitat von "Bernhard M. Wiedemann" <bernhardout@lsmod.de>:
On 04/10/2018 14.51, Eugen Block wrote:
So I wanted to take a look what's happening, turned on debug logs and found that in some cases (I have no idea how to reproduce yet) there is an image downloaded to /var/lib/nova/instances/_base which then is used to import it back to Ceph, that is obviously the reason for the delay.
You could try to import a new image into glance and see if its first boot is slow. This is what you get with local storage on compute nodes, too (per node)
In that case, it could be a problem with the integration between the glance, cinder? and nova ceph backends. -- To unsubscribe, e-mail: opensuse-cloud+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-cloud+owner@opensuse.org
-- To unsubscribe, e-mail: opensuse-cloud+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-cloud+owner@opensuse.org
Hi, I just wanted to follow up on this for documentation purpose. Although I still don't have all answers there's something I can explain. When I upload a new image, let's say a Leap-15.iso to create a new base image for glance, and I use "--disk-format iso", this will lead to the described behavior, nova will report something like:
rbd image clone requires image format to be 'raw' but image rbd://<Cluster_ID>/images/<IMAGE_ID>/snap is 'iso' is_cloneable
If I launch a new instance from that iso, nova will download it to the filesystem (/var/lib/nova/instances/_base) which will take some time. Then I attach an empty volume, finish the installation, destroy the instance and upload the volume to glance, that new glance image has a default "disk-format = raw". Now when I launch an instance from this new image I usually get a CoW clone on RBD layer. The _base file of the ISO will eventually be removed by nova if the base file is too old to keep. This is how I always created new instances, not knowing that the ISOs should have the "raw" disk-format. Despite the wrong format of ISOs my procedure usually leads to CoW clones anyway since I upload volumes to glance. I tried to reproduce it today with the exact same workflow, but everything worked as expected (including the download of the iso image to local filesystem, I learned that now). So it's still unclear why nova downloaded a raw glance image to the local filesystem during the previous attempt. I always knew that with Ceph as backend it's recommended to use raw images but I always assumed the "disk-format" was not more than a display option. Well, now I know that, but this still doesn't explain the downloaded base image. If anyone has an idea how to reproduce such behavior or an explanation, I'd love to hear it. Regards, Eugen Zitat von "Bernhard M. Wiedemann" <bernhardout@lsmod.de>:
On 04/10/2018 14.51, Eugen Block wrote:
So I wanted to take a look what's happening, turned on debug logs and found that in some cases (I have no idea how to reproduce yet) there is an image downloaded to /var/lib/nova/instances/_base which then is used to import it back to Ceph, that is obviously the reason for the delay.
You could try to import a new image into glance and see if its first boot is slow. This is what you get with local storage on compute nodes, too (per node)
In that case, it could be a problem with the integration between the glance, cinder? and nova ceph backends. -- To unsubscribe, e-mail: opensuse-cloud+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-cloud+owner@opensuse.org
-- To unsubscribe, e-mail: opensuse-cloud+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-cloud+owner@opensuse.org
Hi Eugen, Am Mo., 8. Okt. 2018 um 14:56 Uhr schrieb Eugen Block <eblock@nde.ag>:
explanation, I'd love to hear it.
Can you share the relevant parts of your glance and nova.conf? more explicitely did you enable the show_image_direct_url = True in glance and dis you set an rbd based ephemeral backend in nova? see http://docs.ceph.com/docs/master/rbd/rbd-openstack/#configure-openstack-to-u... Greetings, Dirk -- To unsubscribe, e-mail: opensuse-cloud+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-cloud+owner@opensuse.org
Hi Dirk, thanks for your reply.
Can you share the relevant parts of your glance and nova.conf? more explicitely did you enable the
show_image_direct_url = True
control1:~ # grep -r show_image_direct_url /etc/ /etc/glance/glance-api.conf:show_image_direct_url = true The configs seem fine since the ephemeral disks usually are cow-clones in this environment and the rest works as expected (most of the time). I sent an update on this topic two days ago, I learned that nova imports flat images if the image's disk-format is not "raw". I just couldn't explain why it did that in the specific case as I was sure the image's format was raw (the image has been deleted in the meantime). But yesterday I encountered the same problem. I tried to upload an image from volume via Horizon dashboard (where a dropdown exists to choose disk-format) and the preselected value was indeed "qcow2" instead of "raw". I have no idea *why* it was different from the default "raw" but this could have happened last week, too. So I know the reason why some ephemeral disks are flat, I just can't tell how Horizon selects the format. Maybe I'll find another hint... Thanks again, I think this thread can be considered as "closed". :-) Regards, Eugen Zitat von Dirk Müller <dirk@dmllr.de>:
Hi Eugen,
Am Mo., 8. Okt. 2018 um 14:56 Uhr schrieb Eugen Block <eblock@nde.ag>:
explanation, I'd love to hear it.
Can you share the relevant parts of your glance and nova.conf? more explicitely did you enable the
show_image_direct_url = True
in glance and dis you set an rbd based ephemeral backend in nova?
see http://docs.ceph.com/docs/master/rbd/rbd-openstack/#configure-openstack-to-u...
Greetings, Dirk
-- To unsubscribe, e-mail: opensuse-cloud+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-cloud+owner@opensuse.org
participants (3)
-
Bernhard M. Wiedemann
-
Dirk Müller
-
Eugen Block