[Bug 642078] New: xm snapshot-create causes qemu-dm to SEGV

27 Sep 2010

https://bugzilla.novell.com/show_bug.cgi?id=642078

https://bugzilla.novell.com/show_bug.cgi?id=642078#c0


           Summary: xm snapshot-create causes qemu-dm to SEGV
    Classification: openSUSE
           Product: openSUSE 11.3
           Version: Final
          Platform: x86-64
        OS/Version: openSUSE 11.3
            Status: NEW
          Severity: Major
          Priority: P5 - None
         Component: Xen
        AssignedTo: jdouglas@novell.com
        ReportedBy: brianlgilbert@gmail.com
         QAContact: qa@suse.de
          Found By: ---
           Blocker: ---


User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.3
(KHTML, like Gecko) Chrome/6.0.472.62 Safari/534.3

Using xen-...-4.0.1_21326_01-1.x86_64 rpms, but pretty sure I had the same
problem with xen...-4.0 rpms

Windows xpsp3 hvm with a single qcow2 disk image (contains no snapshots) as
follows:  disk = [ 'tap:qcow2:/some_dir/xpsp3.qcow2,hda,w' ]

Noted there are 2 messages in /var/log/xen/qemu-dm-xpsp3.log for each xm
snapshot-create:
Creating snapshot on 'hda' (qcow2)
Creating snapshot on 'blktap0' (qcow2)

Reproducible: Always

Steps to Reproduce:
1. xm start xpsp3
2. while (1)  {
    sleep 60 seconds
    xm snapshot-delete xpsp3 snap1
    xm snapshot-create xpsp3 snap1
  }
3.  Usually fails on first loop, sometimes takes a couple

Actual Results:  
qemu-dm crashes, migrating-<domainname> stuck in 's' state as shown by xm list

Expected Results:  
qemu-dm does not crash
xm list shows the domain still running


Debugging info:
drives_table[] contains two entries on my system that both refer to the same
filename: one for hda and one for qcow2, hence the two messages in my
qemu-dm-xxx.log.  The second entry (blktap0) is a result of a patch by
kwolf@suse.de in Oct 08: "If blktap drives are registered properly, qemu code
is much less likely to get confused by them. Use bdrv_new(), assign a
device name and create an entry in drives_table for them." See
http://lists.xensource.com/archives/html/xen-devel/2008-10/msg00091.html

I think the problem is that the caching and/or refcount code in block-qcow2.c
does not properly handle multiple active bdrvs referencing the same file.  In
my specific case I saw the hda bdrv used to update l1_table[6] on disk and the
cached value in the hda bdrv for l1_table[6] was updated, but the old cached
value for l1_table[6] in the blktap0 bdrv was used later.

For the time being I have worked around this locally with the following patch
(to a SuSE patch) and I would like some feedback.  Not sure if this is the best
way or if there are other places similar logic must be applied, or if perhaps
kwolf@suse.de's patch should be reverted.  I think I emailed him at
kwolf@redhat.com last week with no answer yet...

diff -rupN 4.0.1-testing.orig//tools/ioemu-remote/savevm.c
4.0.1-testing/tools/ioemu-remote/savevm.c

--- 4.0.1-testing.orig//tools/ioemu-remote/savevm.c    2010-09-14
12:42:33.000000000 -0400
+++ 4.0.1-testing/tools/ioemu-remote/savevm.c    2010-09-24 10:47:01.911313614
-0400
@@ -1023,8 +1023,10 @@ int save_disk_snapshots(const char* name
     struct timeval tv;
     int saved_vm_running;
     int can_snapshot;
-    int i;
+    int i,j;
     int ret = 0;
+    int drive_snapshotted[MAX_DRIVES];
+    int match;

     /* Deal with all aio submit */
     qemu_aio_flush();
@@ -1038,8 +1040,8 @@ int save_disk_snapshots(const char* name

     /* Ensure that all images support snapshots or are read-only */
     for(i = 0; i < MAX_DRIVES; i++) {
-        bs = drives_table[i].bdrv;

+        bs = drives_table[i].bdrv;
         if (!bs || !bs->drv)
             continue;

@@ -1060,6 +1062,8 @@ int save_disk_snapshots(const char* name
     sn.vm_clock_nsec = qemu_get_clock(vm_clock);
     pstrcpy(sn.name, sizeof(sn.name), name);

+    memset(drive_snapshotted, 0, sizeof(drive_snapshotted));
+
     for(i = 0; i < MAX_DRIVES; i++) {
         bs = drives_table[i].bdrv;

@@ -1067,6 +1071,28 @@ int save_disk_snapshots(const char* name
         if (!bs || !bs->drv || bdrv_is_read_only(bs))
             continue;

+        /* drives_table can (and usually does) contain multiple devices
+         * that reference the same qcow2 file, that doesn't seem right
+         * but until that is decide this prevents us from snapshotting
+         * the file twice which seems to cause cache/file consistency issues
+         */
+        for (j=0, match=0; j < i; j++) {
+            if (drive_snapshotted[j]) {
+                if (0 == strcmp(drives_table[j].bdrv->filename, bs->filename))
{
+                    fprintf(stderr, "Skipping snapshot on '%s' (%s)
(filename=%s), snapshot already taken!\n",
+                            bdrv_get_device_name(bs),
+                            bs->drv->format_name,
+                            bs->filename);
+                    match = 1;
+                    break;
+                }
+            }
+        }
+
+        if (match) {
+            continue;
+        }
+
         /* Delete old snapshot of the same name */
         if (bdrv_snapshot_find(bs, &old_sn, name) >= 0) {
             ret = bdrv_snapshot_delete(bs, old_sn.id_str);
@@ -1077,14 +1103,15 @@ int save_disk_snapshots(const char* name
         }

         /* Create the snapshot */
-        fprintf(stderr, "Creating snapshot on '%s' (%s)\n",
-            bdrv_get_device_name(bs), bs->drv->format_name);
+        fprintf(stderr, "Creating snapshot on '%s' (%s) (filename=%s)\n",
+            bdrv_get_device_name(bs), bs->drv->format_name, bs->filename);
         ret = bdrv_snapshot_create(bs, &sn);
         if (ret < 0) {
             fprintf(stderr, "Error while creating snapshot on '%s': %d
(%s)\n",
                         bdrv_get_device_name(bs), ret, strerror(-ret));
             goto the_end;
         }
+        drive_snapshotted[i] = 1;
     }

     fprintf(stderr, "Created all snapshots\n");

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

    

[Bug 642078] New: xm snapshot-create causes qemu-dm to SEGV

bugzilla_noreply＠novell.com