[Bug 391709] New: Error when attempting `xm save` of pv guest
https://bugzilla.novell.com/show_bug.cgi?id=391709 Summary: Error when attempting `xm save` of pv guest Product: openSUSE 11.0 Version: Beta 3 Platform: Other OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: Xen AssignedTo: jfehlig@novell.com ReportedBy: jdouglas@novell.com QAContact: jdouglas@novell.com CC: carnold@novell.com, lbendixs@novell.com Found By: --- Created an attachment (id=216139) --> (https://bugzilla.novell.com/attachment.cgi?id=216139) xend.log I received the following error when after attempting to do an xm save on a 32-bit pv sles10sp2 guest: xen75:/ # xm save 2 /tmp/sles10sp2-32.sav Error: /usr/lib64/xen/bin/xc_save 19 2 0 0 0 failed Usage: xm save [-c] <Domain> <CheckpointFile> Save a domain state to restore later. -c, --checkpoint Leave domain running after creating snapshot Running xm list revealed that the VM was in the following state: Name ID Mem VCPUs State Time(s) Domain-0 0 14582 4 r----- 534.2 sles10sp2-32 2 1024 4 ---s-- 106.0 I tried doing an xm shutdown and that failed. I then tried an xm restore and that seemed to succeed because suddenly I had two sles10sp2-32 vms listed in xm list: xen75:/ # xm li Name ID Mem VCPUs State Time(s) Domain-0 0 13559 4 r----- 552.1 sles10sp2-32 2 1024 4 ---s-- 106.0 sles10sp2-32 5 1024 4 -b---- 0.1 I was able to login to the restored guest, and things appeared to be working as expected, so it appears as though the save and restore worked with the exception of stopping the vm upon the successful save. BTW, I am running the 64-bit hypervisor/dom0. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=391709 User jdouglas@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=391709#c1 --- Comment #1 from Jason Douglas <jdouglas@novell.com> 2008-05-16 17:28:27 MST --- Just as a point of reference ... I tried this same operation with a 32-bit WinXP guest, and the operation seemed to work perfectly. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=391709 User jfehlig@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=391709#c2 --- Comment #2 from James Fehlig <jfehlig@novell.com> 2008-05-20 19:02:20 MST --- Another observation: Save/restore of 32 pv guest on 64 hypervisor/dom0 works in SLES10 SP2. The code is not much different :-/. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=391709 User agresko@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=391709#c3 Aaron Gresko <agresko@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |agresko@novell.com --- Comment #3 from Aaron Gresko <agresko@novell.com> 2008-05-21 13:28:33 MST --- I'm seeing the same problem trying to migrate a W2k3 FV guest. virthost1:~ # xm migrate saps02 192.168.1.52 Error: /usr/lib64/xen/bin/xc_save 22 1 0 0 4 failed Usage: xm migrate <Domain> <Host> Migrate a domain to another machine. Options: -h, --help Print this help. -l, --live Use live migration. -p=portnum, --port=portnum Use specified port for migration. -r=MBIT, --resource=MBIT Set level of resource usage for migration. virthost1:~ # xm list Name ID Mem VCPUs State Time(s) Domain-0 0 7009 4 r----- 56.9 saps01 1024 2 1712.8 saps02 1 1024 2 ---s-- 158.4 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=391709 James Fehlig <jfehlig@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|brogers@novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=391709 User jfehlig@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=391709#c9 James Fehlig <jfehlig@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jbeulich@novell.com --- Comment #9 from James Fehlig <jfehlig@novell.com> 2008-05-23 16:22:05 MDT --- Jan, Bruce, do you have any suggestion? I'm stuck at this point. I've done some more investigation on this issue and I found that xc_domain_save() in tool/libxc/xc_domain_save.c mysteriously exists the xc_save process when calling munmap() during cleanup at line 1610. The live_p2m variable contains a bogus address (see gdb trace below). That address was obtained by an mmap() call in tools/libxc/xc_linux.c, function xc_map_foreign_batch(). The address is bogus after the mmap call. (gdb) until 705 map_and_save_p2m_table (xc_handle=3, io_fd=4, dom=2, p2m_size=133120, live_shinfo=0x7f074322a000) at xc_domain_save.c:705 705 p2m = xc_map_foreign_batch(xc_handle, dom, PROT_READ, (gdb) s xc_map_foreign_batch (xc_handle=0, dom=4198400, prot=32519, arr=0x410, num=1) at xc_linux.c:66 66 { (gdb) n 69 addr = mmap(NULL, num*PAGE_SIZE, prot, MAP_SHARED, xc_handle, 0); (gdb) p addr $2 = (void *) 0x0 (gdb) n 70 if ( addr == MAP_FAILED ) { (gdb) p addr $3 = (void *) 0x7f0743191000 (gdb) x /10hb 0x7f0743191000 0x7f0743191000: Cannot access memory at address 0x7f0743191000 (gdb) c Continuing. Breakpoint 2, xc_domain_save (xc_handle=3, io_fd=4, dom=2, max_iters=29, max_factor=3, flags=0, suspend=0x4010ec <suspend>, hvm=0, init_qemu_maps=0x401358 <init_qemu_maps>, qemu_flip_buffer=0x40118b <qemu_flip_buffer>) at xc_domain_save.c:1609 1609 if ( live_p2m ) (gdb) n 1610 munmap(live_p2m, ROUNDUP(p2m_size * sizeof(xen_pfn_t), PAGE_SHIFT)); (gdb) p live_p2m $4 = (xen_pfn_t *) 0x7f0743191000 (gdb) p *live_p2m Cannot access memory at address 0x7f0743191000 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=391709 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=391709#c10 --- Comment #10 from Jan Beulich <jbeulich@novell.com> 2008-05-26 04:10:10 MDT ---
I've done some more investigation on this issue and I found that xc_domain_save() in tool/libxc/xc_domain_save.c mysteriously exists the xc_save process when calling munmap() during cleanup at line 1610.
Are you saying that the munmap call doesn't return (but rather result in process exit)? Not being able to read (from the debugger) the memory live_p2m points to doesn't really mean the address is bogus - as long as it's (as you verified) the same one you got from mmap(), I'd assum all is fine with it. Did you try this with a debug build, so you'd get all the DPRINTF() output? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=391709 User jfehlig@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=391709#c11 --- Comment #11 from James Fehlig <jfehlig@novell.com> 2008-05-28 13:38:50 MDT --- (In reply to comment #10 from Jan Beulich)
Are you saying that the munmap call doesn't return (but rather result in process exit)?
Yes.
Did you try this with a debug build, so you'd get all the DPRINTF() output?
As it turns out, I didn't even need a debug build (due to a bug in tools/libxc/xc_private.h) to get the DPRINTF() outputs. Regardless, I did a debug build and added these DPRINTF()'s: --- xc_domain_save.c.orig 2008-05-28 13:25:24.000000000 -0600 +++ xc_domain_save.c 2008-05-28 13:30:01.000000000 -0600 @@ -1606,8 +1606,11 @@ if ( live_shinfo ) munmap(live_shinfo, PAGE_SIZE); - if ( live_p2m ) + if ( live_p2m ) { + DPRINTF("#### Calling munmap: live_p2m = %p\n", live_p2m); munmap(live_p2m, ROUNDUP(p2m_size * sizeof(xen_pfn_t), PAGE_SHIFT)); + DPRINTF("#### returned from munmap\n"); + } if ( live_m2p ) munmap(live_m2p, M2P_SIZE(max_mfn)); The results (in xend.log): [2008-05-28 13:22:39 5686] INFO (XendCheckpoint:374) Had 0 unexplained entries in p2m table [2008-05-28 13:22:45 5686] INFO (XendCheckpoint:374) Saving memory pages: iter 1 95%^M 1: sent 131072, skipped 0, delta 6185ms, dom0 0%, target 0%, sent 694Mb/s, dirtied 0Mb/s 0 pages [2008-05-28 13:22:45 5686] INFO (XendCheckpoint:374) Total pages sent= 131072 (0.98x) [2008-05-28 13:22:45 5686] INFO (XendCheckpoint:374) (of which 0 were fixups) [2008-05-28 13:22:45 5686] INFO (XendCheckpoint:374) All memory is saved [2008-05-28 13:22:50 5686] INFO (XendCheckpoint:374) #### Calling munmap: live_p2m = 0x7f5ff43ae000 [2008-05-28 13:23:00 5686] ERROR (XendCheckpoint:144) Save failed on domain sles-10-sp2-32-pv-def-net-7a0-1c3 (1). Traceback (most recent call last): File "/usr/lib64/python2.5/site-packages/xen/xend/XendCheckpoint.py", line 112, in save forkHelper(cmd, fd, saveInputHandler, False) File "/usr/lib64/python2.5/site-packages/xen/xend/XendCheckpoint.py", line 362, in forkHelper raise XendError("%s failed" % string.join(cmd)) XendError: /usr/lib64/xen/bin/xc_save 4 1 0 0 0 failed As you can see, I never get message "returned from munmap" - nor the final debug message (already in the code) just before returning from this function. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=391709 User jbeulich@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=391709#c12 --- Comment #12 from Jan Beulich <jbeulich@novell.com> 2008-05-29 02:24:11 MDT --- And when you looked at this in the debugger, it also didn't get any kind of signal? I would expect the call to either return or raise a fault... I assume you checked that there's nothing in the syslog? Stack (or more general memory) corruption could be causing this - could you step through the libc wrapping code of munmap() to see whether the underlying syscall returns? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=391709 User syntron@web.de added comment https://bugzilla.novell.com/show_bug.cgi?id=391709#c13 Matthias Pfafferodt <syntron@web.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |syntron@web.de --- Comment #13 from Matthias Pfafferodt <syntron@web.de> 2008-09-09 16:12:00 MDT --- Are there new information for this bug? I use opensuse 11.0 and I found a similar bug. I try to save a domain (x02-pluto). The save command exits with an error message. But restore is possible using the file. After that I have the domain in two states: 's' and 'b'. host: opensuse 11.0 64bit (2.6.25.11-0.1-xen) guest: opensuse 11.0 32bit mattsys:/tmp # xm list Name ID Mem VCPUs State Time(s) Domain-0 0 4474 2 r----- 322.9 x00-sol 14 512 1 -b---- 9.9 x01-terra 13 512 1 -b---- 1.2 x02-pluto 15 512 1 -b---- 7.2 x03-merkur 12 512 1 -b---- 29.6 x04-uranus 5 512 1 -b---- 25.0 x05-neptun 6 512 1 -b---- 24.0 mattsys:/tmp # xm save 15 x02-pluto.xm_save Error: /usr/lib64/xen/bin/xc_save 25 15 0 0 0 failed Usage: xm save [-c] <Domain> <CheckpointFile> Save a domain state to restore later. -c, --checkpoint Leave domain running after creating snapshot mattsys:/tmp # xm restore x02-pluto.xm_save mattsys:/tmp # xm list Name ID Mem VCPUs State Time(s) Domain-0 0 4474 2 r----- 337.0 x00-sol 14 512 1 -b---- 10.0 x01-terra 13 512 1 -b---- 1.3 x02-pluto 15 512 1 ---s-- 7.2 x02-pluto 16 512 1 -b---- 0.0 x03-merkur 12 512 1 -b---- 29.8 x04-uranus 5 512 1 -b---- 25.0 x05-neptun 6 512 1 -b---- 24.0 mattsys:/tmp # -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=391709 User jfehlig@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=391709#c14 --- Comment #14 from James Fehlig <jfehlig@novell.com> 2008-09-09 16:33:01 MDT --- (In reply to comment #13 from Matthias Pfafferodt)
Are there new information for this bug?
No. I wasn't able to get to the bottom of this problem before 11.0 went GM. I'll need to see if this behavior still exists in 11.1 code base.
I use opensuse 11.0 and I found a similar bug. I try to save a domain (x02-pluto). The save command exits with an error message. But restore is possible using the file. After that I have the domain in two states: 's' and 'b'.
It's not similar but exactly the behavior I was seeing and debugging. JFYI, you can safely destroy the domain in 's' state after doing the save. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com