[Bug 642078] New: xm snapshot-create causes qemu-dm to SEGV
https://bugzilla.novell.com/show_bug.cgi?id=642078 https://bugzilla.novell.com/show_bug.cgi?id=642078#c0 Summary: xm snapshot-create causes qemu-dm to SEGV Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: x86-64 OS/Version: openSUSE 11.3 Status: NEW Severity: Major Priority: P5 - None Component: Xen AssignedTo: jdouglas@novell.com ReportedBy: brianlgilbert@gmail.com QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.62 Safari/534.3 Using xen-...-4.0.1_21326_01-1.x86_64 rpms, but pretty sure I had the same problem with xen...-4.0 rpms Windows xpsp3 hvm with a single qcow2 disk image (contains no snapshots) as follows: disk = [ 'tap:qcow2:/some_dir/xpsp3.qcow2,hda,w' ] Noted there are 2 messages in /var/log/xen/qemu-dm-xpsp3.log for each xm snapshot-create: Creating snapshot on 'hda' (qcow2) Creating snapshot on 'blktap0' (qcow2) Reproducible: Always Steps to Reproduce: 1. xm start xpsp3 2. while (1) { sleep 60 seconds xm snapshot-delete xpsp3 snap1 xm snapshot-create xpsp3 snap1 } 3. Usually fails on first loop, sometimes takes a couple Actual Results: qemu-dm crashes, migrating-<domainname> stuck in 's' state as shown by xm list Expected Results: qemu-dm does not crash xm list shows the domain still running Debugging info: drives_table[] contains two entries on my system that both refer to the same filename: one for hda and one for qcow2, hence the two messages in my qemu-dm-xxx.log. The second entry (blktap0) is a result of a patch by kwolf@suse.de in Oct 08: "If blktap drives are registered properly, qemu code is much less likely to get confused by them. Use bdrv_new(), assign a device name and create an entry in drives_table for them." See http://lists.xensource.com/archives/html/xen-devel/2008-10/msg00091.html I think the problem is that the caching and/or refcount code in block-qcow2.c does not properly handle multiple active bdrvs referencing the same file. In my specific case I saw the hda bdrv used to update l1_table[6] on disk and the cached value in the hda bdrv for l1_table[6] was updated, but the old cached value for l1_table[6] in the blktap0 bdrv was used later. For the time being I have worked around this locally with the following patch (to a SuSE patch) and I would like some feedback. Not sure if this is the best way or if there are other places similar logic must be applied, or if perhaps kwolf@suse.de's patch should be reverted. I think I emailed him at kwolf@redhat.com last week with no answer yet... diff -rupN 4.0.1-testing.orig//tools/ioemu-remote/savevm.c 4.0.1-testing/tools/ioemu-remote/savevm.c --- 4.0.1-testing.orig//tools/ioemu-remote/savevm.c 2010-09-14 12:42:33.000000000 -0400 +++ 4.0.1-testing/tools/ioemu-remote/savevm.c 2010-09-24 10:47:01.911313614 -0400 @@ -1023,8 +1023,10 @@ int save_disk_snapshots(const char* name struct timeval tv; int saved_vm_running; int can_snapshot; - int i; + int i,j; int ret = 0; + int drive_snapshotted[MAX_DRIVES]; + int match; /* Deal with all aio submit */ qemu_aio_flush(); @@ -1038,8 +1040,8 @@ int save_disk_snapshots(const char* name /* Ensure that all images support snapshots or are read-only */ for(i = 0; i < MAX_DRIVES; i++) { - bs = drives_table[i].bdrv; + bs = drives_table[i].bdrv; if (!bs || !bs->drv) continue; @@ -1060,6 +1062,8 @@ int save_disk_snapshots(const char* name sn.vm_clock_nsec = qemu_get_clock(vm_clock); pstrcpy(sn.name, sizeof(sn.name), name); + memset(drive_snapshotted, 0, sizeof(drive_snapshotted)); + for(i = 0; i < MAX_DRIVES; i++) { bs = drives_table[i].bdrv; @@ -1067,6 +1071,28 @@ int save_disk_snapshots(const char* name if (!bs || !bs->drv || bdrv_is_read_only(bs)) continue; + /* drives_table can (and usually does) contain multiple devices + * that reference the same qcow2 file, that doesn't seem right + * but until that is decide this prevents us from snapshotting + * the file twice which seems to cause cache/file consistency issues + */ + for (j=0, match=0; j < i; j++) { + if (drive_snapshotted[j]) { + if (0 == strcmp(drives_table[j].bdrv->filename, bs->filename)) { + fprintf(stderr, "Skipping snapshot on '%s' (%s) (filename=%s), snapshot already taken!\n", + bdrv_get_device_name(bs), + bs->drv->format_name, + bs->filename); + match = 1; + break; + } + } + } + + if (match) { + continue; + } + /* Delete old snapshot of the same name */ if (bdrv_snapshot_find(bs, &old_sn, name) >= 0) { ret = bdrv_snapshot_delete(bs, old_sn.id_str); @@ -1077,14 +1103,15 @@ int save_disk_snapshots(const char* name } /* Create the snapshot */ - fprintf(stderr, "Creating snapshot on '%s' (%s)\n", - bdrv_get_device_name(bs), bs->drv->format_name); + fprintf(stderr, "Creating snapshot on '%s' (%s) (filename=%s)\n", + bdrv_get_device_name(bs), bs->drv->format_name, bs->filename); ret = bdrv_snapshot_create(bs, &sn); if (ret < 0) { fprintf(stderr, "Error while creating snapshot on '%s': %d (%s)\n", bdrv_get_device_name(bs), ret, strerror(-ret)); goto the_end; } + drive_snapshotted[i] = 1; } fprintf(stderr, "Created all snapshots\n"); -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c
Charles Arnold
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c1
--- Comment #1 from Brian Gilbert
it seems there are two bdrvs that reference and write to the same qcow2 file, but the code in block-qcow2.c doesn't properly account for that....
[KEVIN] Right, having the same image file opened twice is asking for trouble. The real problem is how Xen handles PV on HVM. You start the VM using the IDE emulation and at some point during the boot process the OS starts to refer to the same disk using the PV drivers. So you basically have to have it opened twice unless you manage to have one BlockDriverState that is used for both virtual devices. I'm not sure though, why qemu opens the same file twice. I think some patches went in that made it use qemu instead of an external tapdisk backend to actually access the image. I don't have that code around (I'm working KVM nowadays), but you could look up if the code that receives the file open request from blktapctrl requires this blktap0 drive. If so, it should probably be rewritten to associate hda with the blktap interface. That doesn't consider hot-unplug yet, though... To sum it up, the idea of accessing the same image file through two different devices is just insane for anything but raw images. [BRIAN]
So do you have any thoughts on how to best handle this?
Options 1 - Remove your patch, I took out just the #ifndef'd block that adds the new bdrv to drives_table and that seems to get past my problem, but I assume it will bring back the problem you were solving, i.e. make it more likely that qemu gets confused.
[KEVIN] Before the patch, the file was actually opened twice, but the blktap0 drive wasn't properly registered with qemu. The result of this is that the savevm handler didn't see the seconds instance of the drive when iterating over all drives to snapshot. Now I hate myself for the bad commit message, so I can't tell what actually was broken, but not registering properly is probably a bad idea in more than one place. And even without that patch, you still open the image file twice, which is not helpful in any case. [BRIAN]
2 - In file savevm.c, save_disk_snapshots() I added code that checks if the current bs->filename was already saved in this call, and if so skip it, that seems to work too. I worry that there are probably some other places where we need to apply similar logic.
3 - Start with a .qcow2 file that already has at least one snapshot, presumably that forces refcounts >= 2, which seems to prevent the caching that causes this problem. No code change is good, but snapshotting twice seems wasteful and still potentially dangerous.
4 - Either replace the hda drives_table entry with the blktap0 entry, or somehow explicitly associate/link the 2 entries and then update code that uses drives_table to recognize and deal with this special case. Not tested, just an idea, but seems overly complicated.
5 - Have the blktap0 bdrv use the same BlockDriverState as the hda bdrv? Not tested, just an idea. Seems easy enough but could have problems....
[KEVIN] Yes, that's the only one that I think really solves the problem. To clarify this, what I meant here is not having two DriveInfos referring to the same BlockDriverState but rather eliminate the blktap0 DriveInfo and access the hda one where currently the blktap0 one is used. [BRIAN]
6 - Update block-qcow2 to never cache, or to validate caches somehow before using them. Seems like a big task.
[KEVIN] And is certainly going to hurt performance to the point where it's not usable any more (though I think Xen uses an outdated copy of qcow2 code anyway, so it's slow already without doing such things) [BRIAN]
BTW - I am open to using the vhd disk format instead (or something else if it does snapshots), and have started some tests, but I haven't been able to complete an xpsp3 install into a new vhd on OpenSuSE 11.3 yet...
[KEVIN] Unless they have changed their model of doing PV on HVM, I think it's going to have the same problems. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c2
--- Comment #2 from Dongyang Li
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c3
--- Comment #3 from James Fehlig
I think using the upstream xen 4.0.1 way could make the problem go away, any reason why we are handling tapdisk with qemu-dm?
During SLES11 development, Kevin changed blktapctrl to use qemu's tap drivers instead of the duplicated code in tools/blktap/drivers (see blktapctrl-default-to-ioemu.patch). The intent was development of new features and fixing of bugs would occur in one code base only - the qemu code. The code is more actively maintained in the qemu project anyhow. Not much has changed in tools/blktap/drivers. As an example, there has been a lot of work on on qcow2 (much of it by Kevin) in upstream qemu but tools/blktap/drivers/block-qcow2.c in the xen tree has not changed in 19 months, and that was to fix compile warnings! Is using the qemu drivers instead of tapdisk ones causing this problem? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c4
--- Comment #4 from Dongyang Li
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c5
Dongyang Li
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c6
--- Comment #6 from Brian Gilbert
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c7
Dongyang Li
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c8
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c9
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c10
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=642078
https://bugzilla.novell.com/show_bug.cgi?id=642078#c
Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com