Mailinglist Archive: opensuse-bugs (4675 mails)

< Previous Next >
[Bug 642078] New: xm snapshot-create causes qemu-dm to SEGV
  • From: bugzilla_noreply@xxxxxxxxxx
  • Date: Mon, 27 Sep 2010 15:55:32 +0000
  • Message-id: <bug-642078-21960@xxxxxxxxxxxxxxxxxxxxxxxx/>

https://bugzilla.novell.com/show_bug.cgi?id=642078

https://bugzilla.novell.com/show_bug.cgi?id=642078#c0


Summary: xm snapshot-create causes qemu-dm to SEGV
Classification: openSUSE
Product: openSUSE 11.3
Version: Final
Platform: x86-64
OS/Version: openSUSE 11.3
Status: NEW
Severity: Major
Priority: P5 - None
Component: Xen
AssignedTo: jdouglas@xxxxxxxxxx
ReportedBy: brianlgilbert@xxxxxxxxx
QAContact: qa@xxxxxxx
Found By: ---
Blocker: ---


User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.3
(KHTML, like Gecko) Chrome/6.0.472.62 Safari/534.3

Using xen-...-4.0.1_21326_01-1.x86_64 rpms, but pretty sure I had the same
problem with xen...-4.0 rpms

Windows xpsp3 hvm with a single qcow2 disk image (contains no snapshots) as
follows: disk = [ 'tap:qcow2:/some_dir/xpsp3.qcow2,hda,w' ]

Noted there are 2 messages in /var/log/xen/qemu-dm-xpsp3.log for each xm
snapshot-create:
Creating snapshot on 'hda' (qcow2)
Creating snapshot on 'blktap0' (qcow2)

Reproducible: Always

Steps to Reproduce:
1. xm start xpsp3
2. while (1) {
sleep 60 seconds
xm snapshot-delete xpsp3 snap1
xm snapshot-create xpsp3 snap1
}
3. Usually fails on first loop, sometimes takes a couple

Actual Results:
qemu-dm crashes, migrating-<domainname> stuck in 's' state as shown by xm list

Expected Results:
qemu-dm does not crash
xm list shows the domain still running


Debugging info:
drives_table[] contains two entries on my system that both refer to the same
filename: one for hda and one for qcow2, hence the two messages in my
qemu-dm-xxx.log. The second entry (blktap0) is a result of a patch by
kwolf@xxxxxxx in Oct 08: "If blktap drives are registered properly, qemu code
is much less likely to get confused by them. Use bdrv_new(), assign a
device name and create an entry in drives_table for them." See
http://lists.xensource.com/archives/html/xen-devel/2008-10/msg00091.html

I think the problem is that the caching and/or refcount code in block-qcow2.c
does not properly handle multiple active bdrvs referencing the same file. In
my specific case I saw the hda bdrv used to update l1_table[6] on disk and the
cached value in the hda bdrv for l1_table[6] was updated, but the old cached
value for l1_table[6] in the blktap0 bdrv was used later.

For the time being I have worked around this locally with the following patch
(to a SuSE patch) and I would like some feedback. Not sure if this is the best
way or if there are other places similar logic must be applied, or if perhaps
kwolf@xxxxxxx's patch should be reverted. I think I emailed him at
kwolf@xxxxxxxxxx last week with no answer yet...

diff -rupN 4.0.1-testing.orig//tools/ioemu-remote/savevm.c
4.0.1-testing/tools/ioemu-remote/savevm.c
--- 4.0.1-testing.orig//tools/ioemu-remote/savevm.c 2010-09-14
12:42:33.000000000 -0400
+++ 4.0.1-testing/tools/ioemu-remote/savevm.c 2010-09-24 10:47:01.911313614
-0400
@@ -1023,8 +1023,10 @@ int save_disk_snapshots(const char* name
struct timeval tv;
int saved_vm_running;
int can_snapshot;
- int i;
+ int i,j;
int ret = 0;
+ int drive_snapshotted[MAX_DRIVES];
+ int match;

/* Deal with all aio submit */
qemu_aio_flush();
@@ -1038,8 +1040,8 @@ int save_disk_snapshots(const char* name

/* Ensure that all images support snapshots or are read-only */
for(i = 0; i < MAX_DRIVES; i++) {
- bs = drives_table[i].bdrv;

+ bs = drives_table[i].bdrv;
if (!bs || !bs->drv)
continue;

@@ -1060,6 +1062,8 @@ int save_disk_snapshots(const char* name
sn.vm_clock_nsec = qemu_get_clock(vm_clock);
pstrcpy(sn.name, sizeof(sn.name), name);

+ memset(drive_snapshotted, 0, sizeof(drive_snapshotted));
+
for(i = 0; i < MAX_DRIVES; i++) {
bs = drives_table[i].bdrv;

@@ -1067,6 +1071,28 @@ int save_disk_snapshots(const char* name
if (!bs || !bs->drv || bdrv_is_read_only(bs))
continue;

+ /* drives_table can (and usually does) contain multiple devices
+ * that reference the same qcow2 file, that doesn't seem right
+ * but until that is decide this prevents us from snapshotting
+ * the file twice which seems to cause cache/file consistency issues
+ */
+ for (j=0, match=0; j < i; j++) {
+ if (drive_snapshotted[j]) {
+ if (0 == strcmp(drives_table[j].bdrv->filename, bs->filename))
{
+ fprintf(stderr, "Skipping snapshot on '%s' (%s)
(filename=%s), snapshot already taken!\n",
+ bdrv_get_device_name(bs),
+ bs->drv->format_name,
+ bs->filename);
+ match = 1;
+ break;
+ }
+ }
+ }
+
+ if (match) {
+ continue;
+ }
+
/* Delete old snapshot of the same name */
if (bdrv_snapshot_find(bs, &old_sn, name) >= 0) {
ret = bdrv_snapshot_delete(bs, old_sn.id_str);
@@ -1077,14 +1103,15 @@ int save_disk_snapshots(const char* name
}

/* Create the snapshot */
- fprintf(stderr, "Creating snapshot on '%s' (%s)\n",
- bdrv_get_device_name(bs), bs->drv->format_name);
+ fprintf(stderr, "Creating snapshot on '%s' (%s) (filename=%s)\n",
+ bdrv_get_device_name(bs), bs->drv->format_name, bs->filename);
ret = bdrv_snapshot_create(bs, &sn);
if (ret < 0) {
fprintf(stderr, "Error while creating snapshot on '%s': %d
(%s)\n",
bdrv_get_device_name(bs), ret, strerror(-ret));
goto the_end;
}
+ drive_snapshotted[i] = 1;
}

fprintf(stderr, "Created all snapshots\n");

--
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

< Previous Next >
This Thread
  • No further messages