[Bug 653788] New: Error restoring saved x86 domU images on x86-64 dom0
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c0 Summary: Error restoring saved x86 domU images on x86-64 dom0 Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: x86-64 OS/Version: openSUSE 11.3 Status: NEW Severity: Major Priority: P5 - None Component: Xen AssignedTo: jdouglas@novell.com ReportedBy: carlos@keysoft.pt QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Opera/9.80 (Windows NT 6.1; U; en-GB) Presto/2.6.37 Version/10.70 Dom-0 has XENDOMAINS_RESTORE=true and on shutdown the three domU's (windows2008, suse11.3 x64 and suse-11.2 x86) are saved to /var/lib/xen/save. For some time now (I think since the upgrade of dom0 from 11.2 to 11.3), the suse-11.2 x86 domU always fails to restore on reboot. Manual restore with "xm restore" also fails for this domU. Last week I created a new suse-11.3 x86 domU and today rebooted the dom0. Result: both x86 domU's failed to restore on reboot. xm start works ok for both x86 domU's that fail to restore. uname -a: dom0: Linux zeus 2.6.34.7-0.5-xen #1 SMP 2010-10-25 08:40:12 +0200 x86_64 x86_64 x86_64 GNU/Linux suse-11.2 x86 domU: Linux mail 2.6.31.14-0.4-xen #1 SMP 2010-10-25 08:45:30 +0200 i686 i686 i386 GNU/Linux suse-11.3 x86 domU: Linux ksdev 2.6.34.7-0.5-xen #1 SMP 2010-10-25 08:40:12 +0200 i686 i686 i386 GNU/Linux suse-11.3 x64 domU: Linux suse112 2.6.34.7-0.5-xen #1 SMP 2010-10-25 08:40:12 +0200 x86_64 x86_64 x86_64 GNU/Linux short resume of the error on xend.log for a 'xm restore saved-image' for the suse-11.3 x86 domU: [2010-11-15 22:07:18 4161] DEBUG (balloon:191) Balloon: tmem relinquished -1 KiB of 507328 KiB requested. [2010-11-15 22:07:18 4161] DEBUG (balloon:245) Balloon: 16960 KiB free; 0 to scrub; need 524288; retries: 20. [2010-11-15 22:07:18 4161] DEBUG (balloon:259) Balloon: setting dom0 target to 2709 MiB. [2010-11-15 22:07:18 4161] DEBUG (XendDomainInfo:1512) Setting memory target of domain Domain-0 (0) to 2709 Mi B. [2010-11-15 22:07:18 4161] DEBUG (balloon:245) Balloon: 221760 KiB free; 0 to scrub; need 524288; retries: 20. [2010-11-15 22:07:18 4161] DEBUG (balloon:259) Balloon: setting dom0 target to 2705 MiB. [2010-11-15 22:07:18 4161] DEBUG (XendDomainInfo:1512) Setting memory target of domain Domain-0 (0) to 2705 MiB. [2010-11-15 22:07:18 4161] DEBUG (XendCheckpoint:380) [xc_restore]: /usr/lib64/xen/bin/xc_restore 28 6 1 2 0 0 0 0 [2010-11-15 22:07:18 4161] INFO (XendCheckpoint:482) xc_domain_restore start: p2m_size = 20800 [2010-11-15 22:07:18 4161] INFO (XendCheckpoint:482) Reloading memory pages: 0% [2010-11-15 22:07:40 4161] INFO (XendCheckpoint:482) ERROR Internal error: Error when reading batch size [2010-11-15 22:07:40 4161] INFO (XendCheckpoint:482) ERROR Internal error: error when buffering batch, finishing [2010-11-15 22:07:40 4161] INFO (XendCheckpoint:482) [2010-11-15 22:07:40 4161] INFO (XendCheckpoint:4100% [2010-11-15 22:07:40 4161] INFO (XendCheckpoint:482) Memory reloaded (0 pages) [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:3115) XendDomainInfo.destroy: domid=6 [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:2455) No device model [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:2457) Releasing devices [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:2463) Removing vif/0 [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:1293) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0 [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:2463) Removing vkbd/0 [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:1293) XendDomainInfo.destroyDevice: deviceClass = vkbd, devic e = vkbd/0 [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:2463) Removing tap/51712 [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:1293) XendDomainInfo.destroyDevice: deviceClass = tap, device = tap/51712 [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:2463) Removing tap/51728 [2010-11-15 22:07:40 4161] DEBUG (XendDomainInfo:1293) XendDomainInfo.destroyDevice: deviceClass = tap, device = tap/51728 [2010-11-15 22:07:41 4161] DEBUG (XendDomainInfo:2463) Removing console/0 [2010-11-15 22:07:41 4161] DEBUG (XendDomainInfo:1293) XendDomainInfo.destroyDevice: deviceClass = console, de vice = console/0 [2010-11-15 22:07:41 4161] DEBUG (XendDomainInfo:2463) Removing vfb/0 [2010-11-15 22:07:41 4161] DEBUG (XendDomainInfo:1293) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0 [2010-11-15 22:07:41 4161] ERROR (XendCheckpoint:416) /usr/lib64/xen/bin/xc_restore 28 6 1 2 0 0 0 0 failed Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/xen/xend/XendCheckpoint.py", line 384, in restore forkHelper(cmd, fd, handler.handler, True) File "/usr/lib64/python2.6/site-packages/xen/xend/XendCheckpoint.py", line 470, in forkHelper raise XendError("%s failed" % string.join(cmd)) XendError: /usr/lib64/xen/bin/xc_restore 28 6 1 2 0 0 0 0 failed [2010-11-15 22:07:41 4161] ERROR (XendDomain:1188) Restore failed Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomain.py", line 1172, in domain_restore_fd dominfo = XendCheckpoint.restore(self, fd, paused=paused, relocating=relocating) File "/usr/lib64/python2.6/site-packages/xen/xend/XendCheckpoint.py", line 417, in restore raise exn XendError: /usr/lib64/xen/bin/xc_restore 28 6 1 2 0 0 0 0 failed No qemu-dm-xxx.log file is generated when the domU fails to restore. Reproducible: Always -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c1 Charles Arnold <carnold@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO CC| |jfehlig@novell.com, | |pmillett@novell.com InfoProvider| |carlos@keysoft.pt QAContact|qa@suse.de |jdouglas@novell.com --- Comment #1 from Charles Arnold <carnold@novell.com> 2010-11-29 15:44:18 UTC --- Please list the current installed versions of the Xen RPMs in your dom0 (rpm -qa | grep xen) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c2 Carlos Costa e Silva <carlos@keysoft.pt> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|carlos@keysoft.pt | --- Comment #2 from Carlos Costa e Silva <carlos@keysoft.pt> 2010-11-29 17:20:53 WET --- xen RPMs: xen-doc-pdf-4.0.0_21091_06-0.1.1.x86_64 xen-4.0.0_21091_06-0.1.1.x86_64 xen-tools-4.0.0_21091_06-0.1.1.x86_64 kernel-xen-2.6.34.7-0.5.1.x86_64 xen-kmp-default-4.0.0_21091_06_k2.6.34.0_12-0.1.1.x86_64 xen-doc-html-4.0.0_21091_06-0.1.1.x86_64 xen-libs-4.0.0_21091_06-0.1.1.x86_64 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c3 James Fehlig <jfehlig@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO CC| |cyliu@novell.com InfoProvider| |carlos@keysoft.pt --- Comment #3 from James Fehlig <jfehlig@novell.com> 2010-11-30 22:55:40 UTC --- I see your dom0 and 11.3 x86 domU are updated. I tried a simple save followed by a restore of 11.3 32-bit domU and did not see any issues. My test machine has same xen packages and same dom0/domU kernel versions. Does explicitly doing a save followed by restore work on either of the 32bit domUs? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c4 Carlos Costa e Silva <carlos@keysoft.pt> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|carlos@keysoft.pt | --- Comment #4 from Carlos Costa e Silva <carlos@keysoft.pt> 2010-12-03 18:56:26 WET --- I had to do another dom0 restart [security updates] and this time the 11.3 x86 domU restored OK. Did just now a xm save this domU and the restore failed: [2010-12-03 18:52:19 4230] DEBUG (XendCheckpoint:361) restore:shadow=0x0, _static_max=0x20000000, _static_min=0x0, [2010-12-03 18:52:19 4230] DEBUG (XendCheckpoint:380) [xc_restore]: /usr/lib64/xen/bin/xc_restore 28 7 1 2 0 0 0 0 [2010-12-03 18:52:19 4230] INFO (XendCheckpoint:482) xc_domain_restore start: p2m_size = 20800 [2010-12-03 18:52:19 4230] INFO (XendCheckpoint:482) Reloading memory pages: 0% [2010-12-03 18:52:39 4230] INFO (XendCheckpoint:482) ERROR Internal error: Error when reading batch size [2010-12-03 18:52:39 4230] INFO (XendCheckpoint:482) ERROR Internal error: error when buffering batch, finishing [2010-12-03 18:52:39 4230] INFO (XendCheckpoint:482) [2010-12-03 18:52:39 4230] INFO (XendCheckpoint:4100% [2010-12-03 18:52:39 4230] INFO (XendCheckpoint:482) Memory reloaded (0 pages) [2010-12-03 18:52:39 4230] DEBUG (XendDomainInfo:3115) XendDomainInfo.destroy: domid=7 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c5 --- Comment #5 from Carlos Costa e Silva <carlos@keysoft.pt> 2010-12-03 19:11:59 WET --- Worse, tuesday tried a save and restore of the 11.2 x86 domU. xm restore: failed. xm list showed the domU as started with 0 time. xm destroy: ok xm start: said domU started. xm list: hanged. A backup task [a shell script] that was running at the same time: stalled with 100% cpu in chmod on the same disk the vm disk was located. Found a kernel BUG message in /var/log/messages and had to push the reset button on the dom0. As this is a production machine, can't do too many tests ending with pushing the reset button :) Dec 1 04:14:08 zeus kernel: [542982.653504] BUG: soft lockup - CPU#3 stuck for 61s! [chown:2557] Dec 1 04:14:08 zeus kernel: [542982.653504] Modules linked in: nls_utf8 cifs ip6t_LOG xt_tcpudp xt_pkttype xt_physdev ipt_LOG xt_limit tun af_packet usbbk gntdev netbk blkbk blkback_pagemap blktap domctl xenbus_be evtchn raw coretemp bridge stp llc bonding ip6t_REJECT nf_conntrack_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables loop dm_mod i2c_i801 8250_pnp iTCO_wdt i3200_edac 8250 iTCO_vendor_support e1000 floppy pcspkr r8169 sr_mod sg i2c_core serial_core edac_core button ext4 jbd2 crc16 usbhid hid raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid0 uhci_hcd ehci_hcd sd_mod usbcore xenblk cdrom xennet edd raid1 fan ahci libata scsi_mod thermal processor thermal_sys hwmon Dec 1 04:14:08 zeus kernel: [542982.653504] CPU 3 Dec 1 04:14:08 zeus kernel: [542982.653504] Modules linked in: nls_utf8 cifs ip6t_LOG xt_tcpudp xt_pkttype xt_physdev ipt_LOG xt_limit tun af_packet usbbk gntdev netbk blkbk blkback_pagemap blktap domctl xenbus_be evtchn raw coretemp bridge stp llc bonding ip6t_REJECT nf_conntrack_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables loop dm_mod i2c_i801 8250_pnp iTCO_wdt i3200_edac 8250 iTCO_vendor_support e1000 floppy pcspkr r8169 sr_mod sg i2c_core serial_core edac_core button ext4 jbd2 crc16 usbhid hid raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid0 uhci_hcd ehci_hcd sd_mod usbcore xenblk cdrom xennet edd raid1 fan ahci libata scsi_mod thermal processor thermal_sys hwmon Dec 1 04:14:08 zeus kernel: [542982.653504] Dec 1 04:14:08 zeus kernel: [542982.653504] Pid: 2557, comm: chown Tainted: G W 2.6.34.7-0.5-xen #1 S3210SH/S3210SH Dec 1 04:14:08 zeus kernel: [542982.653504] RIP: e030:[<ffffffff800794d2>] [<ffffffff800794d2>] smp_call_function_many+0x1b2/0x230 Dec 1 04:14:08 zeus kernel: [542982.653504] RSP: e02b:ffff8800908dfde8 EFLAGS: 00000202 Dec 1 04:14:08 zeus kernel: [542982.653504] RAX: ffff88000204a7c0 RBX: ffff88000204f680 RCX: 0000000000000000 Dec 1 04:14:08 zeus kernel: [542982.653504] RDX: ffff880002044000 RSI: 0000000000000200 RDI: 0000000000000000 Dec 1 04:14:08 zeus kernel: [542982.653504] RBP: 0000000000000003 R08: ffff88000204f6f0 R09: 0000000000000200 Dec 1 04:14:08 zeus kernel: [542982.653504] R10: 0000000000007ff0 R11: 0000000000000246 R12: ffffffff80751100 Dec 1 04:14:08 zeus kernel: [542982.653504] R13: ffff880021f17e80 R14: ffff88000204f6b0 R15: ffff880021f17bc0 Dec 1 04:14:08 zeus kernel: [542982.653504] FS: 00007f3ebe3ff700(0000) GS:ffff880002044000(0000) knlGS:0000000000000000 Dec 1 04:14:08 zeus kernel: [542982.653504] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 1 04:14:08 zeus kernel: [542982.653504] CR2: 00007f3ebdf595b0 CR3: 00000000007b7000 CR4: 0000000000002660 Dec 1 04:14:08 zeus kernel: [542982.653504] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 1 04:14:08 zeus kernel: [542982.653504] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 1 04:14:08 zeus kernel: [542982.653504] Process chown (pid: 2557, threadinfo ffff8800908de000, task ffff88015e69e1c0) Dec 1 04:14:08 zeus kernel: [542982.653504] Stack: Dec 1 04:14:08 zeus kernel: [542982.653504] ffff8801df4ac138 01ffffff800ea0ca 0000000000000000 ffff880021f17bc0 Dec 1 04:14:08 zeus kernel: [542982.653504] <0> ffff88015e69e1c0 ffff88015e69e7ec ffff880021f17c20 000000000000002a Dec 1 04:14:09 zeus kernel: [542982.653504] <0> ffff8801df430800 ffffffff800242f4 0000000000000000 ffff880021f17bc0 Dec 1 04:14:09 zeus kernel: [542982.653504] Call Trace: Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff800242f4>] arch_exit_mmap+0x44/0xa0 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff800f5e98>] exit_mmap+0x38/0x1c0 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff800414c5>] mmput+0x25/0x100 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff800481c9>] exit_mm+0x109/0x130 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff80048325>] do_exit+0x135/0x3a0 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff80048755>] do_group_exit+0x55/0x110 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff80048822>] sys_exit_group+0x12/0x20 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff80007438>] system_call_fastpath+0x16/0x1b Dec 1 04:14:09 zeus kernel: [542982.653504] [<00007f3ebdf595e8>] 0x7f3ebdf595e8 Dec 1 04:14:09 zeus kernel: [542982.653504] Code: 00 e8 83 52 39 00 0f ae f0 4c 89 f7 e8 b8 f2 f9 ff 80 7c 24 0f 00 0f 84 d4 fe ff ff f6 43 20 01 0f 84 ca fe ff ff 0f 1f 00 f3 90 <f6> 43 20 01 75 f8 e9 ba fe ff ff 0f 1f 00 4c 89 e2 4c 89 ee 89 Dec 1 04:14:09 zeus kernel: [542982.653504] Call Trace: Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff800242f4>] arch_exit_mmap+0x44/0xa0 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff800f5e98>] exit_mmap+0x38/0x1c0 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff800414c5>] mmput+0x25/0x100 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff800481c9>] exit_mm+0x109/0x130 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff80048325>] do_exit+0x135/0x3a0 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff80048755>] do_group_exit+0x55/0x110 Dec 1 04:14:09 zeus kernel: [542982.653504] [<ffffffff80048822>] sys_exit_group+0x12/0x20 Dec 1 04:14:10 zeus kernel: [542982.653504] [<ffffffff80007438>] system_call_fastpath+0x16/0x1b Dec 1 04:14:10 zeus kernel: [542982.653504] [<00007f3ebdf595e8>] 0x7f3ebdf595e8 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c9 --- Comment #9 from Chunyan Liu <cyliu@novell.com> 2010-12-08 10:39:25 UTC --- I synced my 11.3 host and 11.3 x86 domU to the same version and tested 'xm save', 'xm restore' and dom0 reboot. I did 'xm save' and then 'xm restore' for many times, didn't see any problem. About testing reboot dom0, I found there was no save process at all. Curious. Already set XENDOMAINS_SAVE=/var/lib/xen/save, XENDOMAINS_RESTORE=true. According to the description, when dom0 shutdown, it will do 'xm save' to the running domain. But I didn't find any 'xm save' related info in xend.log. And when dom0 started again, no restore process at all. Carlos, do you set any other things? And, about the issue reported, according to xendomains description, while dom0 shutdown, if XENDOMAINS_SAVE is not none, it will save running domains. But if save failed or timeout (there is a MAXWAIT limit), it will force shutdown those domains. I think there is one possibility that save not succeeded when dom0 shutdown, that causes restore fail after dom0 started again. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c10 --- Comment #10 from Chunyan Liu <cyliu@novell.com> 2010-12-09 03:47:16 UTC --- OK. I forgot to set XENDOMAINS_AUTO_ONLY=false. Now I can reproduce the bug. I'll investigate that. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c11 --- Comment #11 from Carlos Costa e Silva <carlos@keysoft.pt> 2010-12-10 20:33:41 WET --- Sorry for not replying earlier: took a few days leave. Great that you can reproduce the bug, hope this clears soon. I'll be here if you need more details. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c12 --- Comment #12 from Chunyan Liu <cyliu@novell.com> 2010-12-13 09:57:38 UTC --- Change following line of /etc/init.d/xendomains: xm restore "$dom" >/dev/null 2>&1 to xm restore "$dom" can work well. You can have a try. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c13 --- Comment #13 from Carlos Costa e Silva <carlos@keysoft.pt> 2010-12-14 00:54:49 WET --- Changed the line in xendomains, restarted the dom0, domU still doesn't restore, with same error in xend.log. Also, for further tests, copied the 11.3x32 domU disk to another test machine [a new 11.3x64 dom0, while the prod machine is a 11.2 -> 11.3 upgrade], created the domU, restarted the domU for safety, xm save ..., xm restore failed. Here is the "virsh create" config for the domU domain: <domain type='xen'> <name>ksdev</name> <uuid>34234b1b-65c2-0528-1bcc-5ce9b41402ea</uuid> <description>KS Development</description> <memory>524288</memory> <currentMemory>524288</currentMemory> <vcpu>1</vcpu> <bootloader>/usr/bin/pygrub</bootloader> <bootloader_args>-q</bootloader_args> <os> <type>linux</type> </os> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/lib/xen/bin/qemu-dm</emulator> <disk type='file' device='disk'> <driver name='tap' type='aio'/> <source file='/var/lib/xen/images/ksdev/disk0.raw'/> <target dev='xvda' bus='xen'/> </disk> <disk type='file' device='disk'> <driver name='tap' type='cdrom'/> <source file='/var/lib/xen/images/shared/openSUSE-11.3-DVD-i586.iso'/> <target dev='xvdb' bus='xen'/> <readonly/> </disk> <interface type='bridge'> <mac address='00:16:36:cc:0d:e6'/> <source bridge='br0'/> <script path='/etc/xen/scriqpts/vif-bridge'/> <target dev='vif-1.0'/> </interface> <console type='pty'> <target port='0'/> </console> <input type='mouse' bus='xen'/> <graphics type='vnc' port='-1' autoport='yes' keymap='pt'/> </devices> </domain> -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c14 --- Comment #14 from Carlos Costa e Silva <carlos@keysoft.pt> 2010-12-14 00:58:24 WET --- Also, I've removed the few installed things in the domU and can now upload both the domU disk and the xm save file somewhere for your testing. disk0.raw.bz2: 584M (1.8G sparse, 20G size), save.bz2: 26M Just say where the files can be uploaded to. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c15 --- Comment #15 from Carlos Costa e Silva <carlos@keysoft.pt> 2010-12-14 01:03:53 WET --- Hmm, while waiting for bzip2 to compress the files, must have typed into the config in #13 resulting in at least one typo (scriqpts is wrong). Correct domU config: <domain type='xen'> <name>ksdev</name> <uuid>34234b1b-65c2-0528-1bcc-5ce9b41402ea</uuid> <description>Keysoft Development Machine</description> <memory>524288</memory> <currentMemory>524288</currentMemory> <vcpu>1</vcpu> <bootloader>/usr/bin/pygrub</bootloader> <bootloader_args>-q</bootloader_args> <os> <type>linux</type> </os> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/lib/xen/bin/qemu-dm</emulator> <disk type='file' device='disk'> <driver name='tap' type='aio'/> <source file='/var/lib/xen/images/ksdev/disk0.raw'/> <target dev='xvda' bus='xen'/> </disk> <disk type='file' device='disk'> <driver name='tap' type='cdrom'/> <source file='/var/lib/xen/images/shared/openSUSE-11.3-DVD-i586.iso'/> <target dev='xvdb' bus='xen'/> <readonly/> </disk> <interface type='bridge'> <mac address='00:16:36:cc:0d:e6'/> <source bridge='br0'/> <script path='/etc/xen/scripts/vif-bridge'/> <target dev='vif-1.0'/> </interface> <console type='pty'> <target port='0'/> </console> <input type='mouse' bus='xen'/> <graphics type='vnc' port='-1' autoport='yes' keymap='pt'/> </devices> </domain> -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c16 --- Comment #16 from Chunyan Liu <cyliu@novell.com> 2010-12-16 09:01:56 UTC --- (In reply to comment #14)
Also, I've removed the few installed things in the domU and can now upload both the domU disk and the xm save file somewhere for your testing. disk0.raw.bz2: 584M (1.8G sparse, 20G size), save.bz2: 26M
Just say where the files can be uploaded to.
Well, those files are really too large. I think you can put it to a network place (for example, dropbox) and give me the address. Then I can get it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c17 --- Comment #17 from Chunyan Liu <cyliu@novell.com> 2010-12-20 06:15:27 UTC --- With the little change in xendomains script, I could not reproduce it any more. I noticed this part in the config file: <disk type='file' device='disk'> <driver name='tap' type='cdrom'/> <source file='/var/lib/xen/images/shared/openSUSE-11.3-DVD-i586.iso'/> <target dev='xvdb' bus='xen'/> <readonly/> </disk> Any special about the shared folder, mounted after OS started? Is it OK if delete xvdb part, left only one disk (xvda)? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c18 --- Comment #18 from Carlos Costa e Silva <carlos@keysoft.pt> 2010-12-20 23:19:33 WET --- Nothing very special about the shared folder: it contains soft links to iso images used to create domUs. In this case, /var/lib/xen/images/shared/openSUSE-11.3-DVD-i586.iso is a soft link to the 11.3x32 dvd iso. I removed the entry from the domU and it seems to show no relevant changes. Now that I have a test dom0, I can do some more tests... While trying the change (removing xvdb cdrom), something very strange is going on: xm start domU -> ok rcxendomains stop -> domU saved rcxendomains start -> domU restore fails (start always fails after the first stop [save]) rcxendomains stop -> nothing rcxendomains start -> domU restore succeeds [most of the time, otherwise it always succeeds after a few more stop/starts]. After this, restarted the system and the domU failed to restore at startup. After five rcxendomains stop/start, the domU started ok. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c19 --- Comment #19 from Chunyan Liu <cyliu@novell.com> 2011-01-27 04:15:50 UTC --- Created an attachment (id=410675) --> (http://bugzilla.novell.com/attachment.cgi?id=410675) new libxenguest.so.4.0.0 Sorry for delay, for a long time, the problem could not be reproduced. These days during some other testing, I found sporadic 'xm restore GEUST_32b' fails, log info is quite similar as yours, and tried restore with the same save.file, sometimes success, sometimes fail. According to gdb debug, there is SIGBUS error in a memcpy() line. We've built a new libxenguest.so which seems to fix that. I'm not sure if it's the same problem as yours, but you can have a try. Just use the new libxenguest.so.4.0.0 to replace /usr/lib64/libxenguest.so.4.0.0. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c20 --- Comment #20 from Carlos Costa e Silva <carlos@keysoft.pt> 2011-01-27 17:25:14 WET --- Downloaded and installed libxenguest.so.4.0.0 to the test dom0 and it seems the problem is solved: - before updating libxenguest restore failed. - updated libxenguest, did a dozen save/restores of the domU and not one failed. After a restart, the domU also restored ok. I'm glad the bug was finally reproducible and solved (I was wondering if there was some bad juju on my dom0's :) Good work, thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c21 --- Comment #21 from Carlos Costa e Silva <carlos@keysoft.pt> 2011-02-15 04:57:59 WET --- Something is still wrong: The tests done on 27 Jan where on the test dom0 with a test domU. I repeated the tests on the production dom0 with the same domU and everything worked. But now and after two restarts of the production dom0, the mail domU (11.2x32 suse) doesn't restore. But the symptoms are different: -> before the mail domU failed to restore and didn't xm list showed it as stopped. -> now the mail domU fails to restore but appears in 'b' state but with 0.0 time. The log is also different: although there's a restore error in xend.log, there's also a line with "Restore exit with rc=0" and I suppose that's why the domU appears as started with 0 time. xend.log excerpt: [2011-02-15 04:53:14 4176] DEBUG (XendCheckpoint:361) restore:shadow=0x0, _static_max=0x40000000, _static_min= 0x0, [2011-02-15 04:53:14 4176] DEBUG (XendCheckpoint:380) [xc_restore]: /usr/lib64/xen/bin/xc_restore 29 12 1 2 0 0 0 0 [2011-02-15 04:53:14 4176] INFO (XendCheckpoint:482) xc_domain_restore start: 2m_size = 40800 [2011-02-15 04:53:14 4176] INFO (XendCheckpoint:482) Reloading memory pages: 0% [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) ERROR Internal error: Error when reading batch size [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) ERROR Internal error: error when buffering batch, finishing [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:4100% [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) Memory reloaded (30890 pages) [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) 100 [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) 101 [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) 102 [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) 103 [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) 104,ctxt:0x60e020, vcpup:0x607010,dinfo->guest_width:4,ctxt.x32:2800 [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) 105 [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) read VCPU 0 [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) Completed checkpoint load [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) Domain ready to be built. [2011-02-15 04:53:31 4176] INFO (XendCheckpoint:482) Restore exit with rc=0 [2011-02-15 04:53:31 4176] DEBUG (XendCheckpoint:453) store-mfn 753974 [2011-02-15 04:53:31 4176] DEBUG (XendCheckpoint:453) console-mfn 753973 [2011-02-15 04:53:31 4176] DEBUG (XendDomainInfo:3057) XendDomainInfo.completeRestore -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c22 --- Comment #22 from Chunyan Liu <cyliu@novell.com> 2011-02-15 07:43:47 UTC ---
From the log, the restore process seems to be normal till last line. Could you attach the whole xend.log?
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c23 --- Comment #23 from Carlos Costa e Silva <carlos@keysoft.pt> 2011-02-15 17:36:43 WET --- Created an attachment (id=414211) --> (http://bugzilla.novell.com/attachment.cgi?id=414211) Complete xend.log Ah, I assumed those 'batch' error lines meant something, but looking at successful restores of the other domU's the lines are there also. Looking more carefully at the log, I don't see differences between a failed restore and a successfull one (for the other domU's)... Complete xend.log attached. Notes: The domU failing to restore is named "scalix-11.4.6" (or scalix for short). There are now 5 domU's in this dom0, one created yesterday after the restore tests. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c24 --- Comment #24 from Chunyan Liu <cyliu@novell.com> 2011-02-16 06:06:46 UTC ---
Ah, I assumed those 'batch' error lines meant something, but looking at successful restores of the other domU's the lines are there also. That error lines are just expected which indicates no more pages to load and will goto finish part.
Looking more carefully at the log, I don't see differences between a failed restore and a successfull one (for the other domU's)... Cannot see any problem from xend.log. Is that issue reproducible?
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c25 --- Comment #25 from Carlos Costa e Silva <carlos@keysoft.pt> 2011-02-16 17:21:48 WET --- Yes. Originally there where two x32 domU's not restoring. I tried the new lib (coment #19) with one domU and it worked, both in test and production, so I thought the problem corrected. It was only after the mail domU failed to restore after two dom0 reboots (it failed both times) that I did more tests. After half a dozen saves with no successful restore it seems that it's reproducible :) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c26 --- Comment #26 from Jason Douglas <jdouglas@suse.com> 2013-04-17 04:15:48 MDT --- This is a very old bug. Is there a reason it is still open? If not, can we close it out? Thanks! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c27 --- Comment #27 from Chunyan Liu <cyliu@suse.com> 2013-04-18 02:16:36 UTC --- Couldn't reproduce locally after the change in comment 19. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=653788 https://bugzilla.novell.com/show_bug.cgi?id=653788#c28 James Fehlig <jfehlig@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |UPSTREAM --- Comment #28 from James Fehlig <jfehlig@suse.com> 2013-04-18 06:33:11 UTC --- Wow, openSUSE11.3? It is no longer maintained. We have Xen 4.2 now in 12.3! Even if we produced a fix, there would be no mechanism to release it. Closing as UPSTREAM since there have been no reports of this issue on newer xen releases. Please open a new bug against 12.{1,2,3} if the problem persists. Thanks! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com