[Bug 651822] New: xm snapshot-xxx scripts lead to an XP SP3 HVM domU to chkdsk
https://bugzilla.novell.com/show_bug.cgi?id=651822 https://bugzilla.novell.com/show_bug.cgi?id=651822#c0 Summary: xm snapshot-xxx scripts lead to an XP SP3 HVM domU to chkdsk Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: x86-64 OS/Version: openSUSE 11.3 Status: NEW Severity: Minor Priority: P5 - None Component: Xen AssignedTo: jdouglas@novell.com ReportedBy: brianlgilbert@gmail.com QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7 Happens about 25-50% of the time for me. Seems to happen more often if the snapshot is taken while domU disk io is high. Reproducible: Sometimes Steps to Reproduce: 1. dom0: xm start xpsp3 --vncviewer 2. domU: boots, start regedit and search for some random string 3. dom0: xm snapshot-delete xpsp3 snap1 && xm snapshot-create xpsp3 snap1 (while search is still running in domU) 4. domU: Shutdown regedit and windows after search completes 5. dom0: xm snapshot-apply xpsp3 snap1 && xm vncviewer xpsp3 6. domU: Shutdown regedit and windows after search completes 7. dom0: xm start xpsp3 --vncviewer 8. domU: boots and reports that a chkdsk is needed Actual Results: Windows domU reports that a chkdsk is needed Expected Results: Windows domU should boot without issue Same xpsp3 qcow2 disk image on bug 642078 (i.e. no snapshots to start with) Built and installed xen-devel, xen-tools, xen-libs, and xen-tools using xen-4.0.1_01-84.1.src.rpm, with and without the patch for bug 64207, that doesn't seem to matter. Installed kernel-xen-2.6.34.7-31.1.x86_64.rpm So far after the chkdsk completes the guest is usable, but there shouldn't be any corruption should there? Observed that performing an extra iteration of step 3 between steps 5 and 6 i.e. after applying the snapshot, delete and recreate it before shutting windows down, seems to prevent subsequent chkdsk problems. Thinking that there may be some weird refCount issues in the qcow2 code, I tried making sure the qcow2 file contains a second snapshot the whole time and never deleting that. It didn't seem to help. This could very well be a bug in QEMU's qcow2 code (or Xen's version), but since I am hitting it via the xm snapshot-create/apply/delete scripts I am reporting here first. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c
Charles Arnold
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c1
--- Comment #1 from Dongyang Li
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c2
--- Comment #2 from Brian Gilbert
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c3
--- Comment #3 from Dongyang Li
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c7
--- Comment #7 from Brian Gilbert
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c8
--- Comment #8 from Dongyang Li
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c9
--- Comment #9 from James Fehlig
I've tested it on a PV guest, an HVM guest and an HVM guest with PV driver, seems good. any comments?
I didn't test the patch but it looks good. Thanks for the detailed comments. ACK. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c13
--- Comment #13 from Brian Gilbert
The above fix introduces additional problems for me. The chkdsk problem goes away, but that seems to be the side effect of a larger bug. It seems that once snap1 has been applied to the vm, the xpsp3 xenstore is set so that "xm start" will also cause the snapshot to be applied. Perhaps the sxprep.append() needs to be undone somewhere? as I said in comment#3, after applying the snapshot, the disk state is reverted to the stat of the snapshot, so this is expected.
Steps:
1. dom0: xm start xpsp3 --vncviewer 2. domU: boots, create file a 3. dom0: xm snapshot-create xpsp3 snap1 4. domU: create file b, shutdown
5. dom0: xm start xpsp3 --vncviewer 6. domU: boots, files a & b exist, create file c, shutdown
7. dom0: xm snapshot-apply xpsp3 snap1 && xm vncviewer xpsp3 8. domU: starts (already booted), as expected file a exists but b & c do not, create file d, shutdown
9. dom0: xm start xpsp3 --vncviewer 10. domU: boots, as expected file a exists and b & c do not, unexpectedly file d does not exist, create file e, shutdown
11. dom0: xm start xpsp3 --vncviewer 12. domU: boots, as expected file a exists and b & c do not, unexpectedly file d & e do not exist, create file f, shutdown
13. dom0: xm snapshot-delete xpsp3 snap1 14. dom0: xm start xpsp3 --vncviewer 15. domU: "Boot from Hard Disk failed: could not read the boot disk"
16. dom0: xm delete xpsp3 && xm create xpsp3.hvm 17. dom0: xm start xpsp3 -- vncviewer 18. domU: boots, as expected file a exists, no other files exists - not sure what to expect for them given the above errors however files created after applying the snapshots are gone, this is another
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c14
--- Comment #14 from Dongyang Li
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c15
--- Comment #15 from Brian Gilbert
however files created after applying the snapshots are gone, this is another problem, looking into it.
Thanks. Without the patch, files created after applying a snapshot aren't lost, so it doesn't seem like a separate problem. More info: I don't understand the sxprep.append (or xenstore in general). But it appears that with the patch, after the snapshot is created, the command "xenstore-ls" shows an extra value snapshot="snap1" associated with the disk. I assume this is read during start up. I noticed that this value is never deleted, that's why I asked about undoing sxprep.append(). It seems like for an "xm start" that value shouldn't be there, and its presence is causing the snapshot to be applied? I also note that the value is changed when a later snapshot is taken. That might mean we can only apply the most recent snapshot, I didn't test. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
(In reply to comment #14)
however files created after applying the snapshots are gone, this is another problem, looking into it.
Thanks. Without the patch, files created after applying a snapshot aren't lost, so it doesn't seem like a separate problem. without the patch the snapshot is never applied, so you are writing files to
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c16
--- Comment #16 from Dongyang Li
More info: I don't understand the sxprep.append (or xenstore in general). But it appears that with the patch, after the snapshot is created, the command "xenstore-ls" shows an extra value snapshot="snap1" associated with the disk. I assume this is read during start up. I noticed that this value is never deleted, that's why I asked about undoing sxprep.append(). It seems like for an "xm start" that value shouldn't be there, and its presence is causing the snapshot to be applied?
yes you are right. but that only get used once when we start up.
I also note that the value is changed when a later snapshot is taken. That might mean we can only apply the most recent snapshot, I didn't test. nope, the snap param is used for informing the qemu-dm to create the disk snapshot, and the former param is already been used at startup.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
(In reply to comment #14)
however files created after applying the snapshots are gone, this is another problem, looking into it.
Thanks. Without the patch, files created after applying a snapshot aren't lost, so it doesn't seem like a separate problem.
More info: I don't understand the sxprep.append (or xenstore in general). But it appears that with the patch, after the snapshot is created, the command "xenstore-ls" shows an extra value snapshot="snap1" associated with the disk. I assume this is read during start up. I noticed that this value is never deleted, that's why I asked about undoing sxprep.append(). It seems like for an "xm start" that value shouldn't be there, and its presence is causing the snapshot to be applied? yes, thanks for pointing out this, so we are always applying the snapshot with xm start after a xm snapshot-apply,
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c17
--- Comment #17 from Dongyang Li
I also note that the value is changed when a later snapshot is taken. That might mean we can only apply the most recent snapshot, I didn't test.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c18
--- Comment #18 from Dongyang Li
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c19
--- Comment #19 from Brian Gilbert
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c20
--- Comment #20 from James Fehlig
ok, we should not save the snapshot name in the config.sxp so next time when xm start a domU, we won't apply the formal snapshot again and again.
Yeah, we definitely don't want to store the snapshot name in persistent domain config. How is it getting there? When doing a snapshot-apply (restore)? If so, is it possible to strip 'snapshotname' from the config read from save file before creating XendConfig object? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c21
--- Comment #21 from Dongyang Li
(In reply to comment #18)
ok, we should not save the snapshot name in the config.sxp so next time when xm start a domU, we won't apply the formal snapshot again and again.
Yeah, we definitely don't want to store the snapshot name in persistent domain config. How is it getting there? When doing a snapshot-apply (restore)?If yes, when we apply the snapshot, we read the config from the save file and create a domain with that. so, is it possible to strip 'snapshotname' from the config read from save file before creating XendConfig object? we still need to send the 'snapshotname' to xenstore to inform the device model to revert the disk image, so we can't create XendConfig without that, what am I doing is strip it before we save the domain config and put it back, it will be eventually popped when we give the snapshotname to xenstore. Thanks
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c22
--- Comment #22 from James Fehlig
what am I doing is strip it before we save the domain config and put it back, it will be eventually popped when we give the snapshotname to xenstore. Thanks
Heh, another hack to the hack that is snapshot support :-). It will be interesting to see how this functionality can be implemented in the new xen tools stack (xl/libxenlight). I haven't looked close enough to see how extensions are handled. Regardless, we'll need to figure out what to do with the snapshot patches once xm/xend are deprecated/disabled. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c25
Dongyang Li
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c27
James Fehlig
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c28
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c34
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c35
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c36
--- Comment #36 from Brian Gilbert
https://bugzilla.novell.com/show_bug.cgi?id=651822
https://bugzilla.novell.com/show_bug.cgi?id=651822#c37
James Fehlig
Why was this reopened?
It was reopened due to some confusion concerning the last maintenance update (released March 24). That confusion was cleared up and the bug should be closed again. I'll do so now.
I ask because I have started having chkdsk problems again with xen-4.1.0.
xen4.1.0 from Factory? Be warned that it has not gotten much testing at all and could have lots of problems. I hope to do some real testing on it soon and fix up any issues that keep it from being at least minimally usable. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com