Mailinglist Archive: opensuse-bugs (4766 mails)

< Previous Next >
[Bug 599147] dom0 crashes with hvm domUs
  • From: bugzilla_noreply@xxxxxxxxxx
  • Date: Wed, 28 Apr 2010 13:07:09 +0000
  • Message-id: <20100428130709.EA46E2454D6@xxxxxxxxxxxxxxxxxxxxxx>
http://bugzilla.novell.com/show_bug.cgi?id=599147

http://bugzilla.novell.com/show_bug.cgi?id=599147#c4


Harald Koenig <koenig@xxxxxxxx> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEEDINFO |NEW
InfoProvider|koenig@xxxxxxxx |

--- Comment #4 from Harald Koenig <koenig@xxxxxxxx> 2010-04-28 13:07:08 UTC ---
I'll try to setup some serial console support using IPMI with SOL support.

Restoring needinfo.

ok, the good news: the serial console for xen/dom0 works via ipmi/sol!
the bad news: I did not manage to crash the machine again using one (more) hvm
client(s) running the same compile benchmark as before :-(

but anyway I got some xen messages which *might* be helpful to you to give you
a clue what's going on^H^Hwrong anyway.

the client to be tested is called "os-centos4u4" which is running 32bit centos
4u4 (os2-* clients are 64 bit, os-* are 32 bit...)

1st test (before full reboot of dom0 for a 2n try), domU disks are on/from a
remote iscsi server:

os-centos4u4 runs as dom31 and uses all 4 CPUs (2*dual-core xeon), 1 GB ram
(phys 16 GB, 4GB left for dom0 right now).
running a large compile job with "make -j6 -l8" we se many of those msgs:

(XEN) mm.c:767:d31 Error getting mfn 7f2e (pfn 3eb28c) from L1 entry
0000000007f2e063 for dom31
(XEN) printk: 382 messages suppressed.
(XEN) mm.c:2270:d31 Bad type (saw 2800000000000001 != exp e000000000000000) for
mfn 18ad6 (pfn 3bea2)
(XEN) printk: 400 messages suppressed.
(XEN) mm.c:2270:d31 Bad type (saw 2800000000000001 != exp e000000000000000) for
mfn 7f7b (pfn 3eb2d9)
(XEN) printk: 283 messages suppressed.
(XEN) mm.c:2270:d31 Bad type (saw 6800000000000001 != exp e000000000000000) for
mfn 18aa2 (pfn 3bed6)
(XEN) printk: 276 messages suppressed.
(XEN) mm.c:2270:d31 Bad type (saw 2800000000000001 != exp e000000000000000) for
mfn 31cde (pfn 2f24e)
(XEN) printk: 304 messages suppressed.
(XEN) mm.c:2270:d31 Bad type (saw 2800000000000001 != exp e000000000000000) for
mfn 196f7 (pfn 3b281)
(XEN) printk: 366 messages suppressed.
(XEN) mm.c:767:d31 Error getting mfn 7f22 (pfn 3eb280) from L1 entry
0000000007f22063 for dom31
(XEN) printk: 385 messages suppressed.
(XEN) mm.c:2270:d31 Bad type (saw 2800000000000001 != exp e000000000000000) for
mfn 18ad8 (pfn 3bea0)
(XEN) printk: 241 messages suppressed.
(XEN) mm.c:2270:d31 Bad type (saw 2800000000000001 != exp e000000000000000) for
mfn 31cde (pfn 2f24e)
(XEN) printk: 329 messages suppressed.
(XEN) mm.c:2270:d31 Bad type (saw 2800000000000001 != exp e000000000000000) for
mfn 7f7b (pfn 3eb2d9)
(XEN) printk: 291 messages suppressed.
(XEN) mm.c:767:d31 Error getting mfn 7f1f (pfn 3eb27d) from L1 entry
0000000007f1f063 for dom31
(XEN) printk: 145 messages suppressed.
(XEN) mm.c:767:d31 Error getting mfn 7f22 (pfn 3eb280) from L1 entry
0000000007f22063 for dom31
(XEN) printk: 154 messages suppressed.

I found this thread for those msgs -- but at least for me it was mo realy help
;-)

http://lists.xensource.com/archives/html/xen-devel/2010-04/msg00777.html


so I rebootet the whole server (dom0), all domUs shutdown/restared.
this time the os-centos4u4 disks are local image files on the dom0 file system
(as it has been for at least 2 crashes while benchmaking sw-builds with
different xen setups).

now while running make/gcc/g++ on os-centos4u4 there are no xen msgs anymore --
and no dom0 crash so far since this morning :-(

BUT: at domU boot time I got the follow xen msg (full "xm dmesg" attached...)

(XEN) mm.c:767:d6 Error getting mfn 90c47 (pfn 3623a5) from L1 entry
0000000090c47061 for dom6
(XEN) traps.c:466:d6 Unhandled invalid opcode fault/trap [#6] on VCPU 0
[ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 6 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.3.1_18546_20-0.1.1 x86_64 debug=n Not tainted ]----

where dom6 is the "1st startup" of os-suse111 -- this is "xm list" after reboot
(os-suse111 got up as dom7 -- big surprise for me;)

# xm lis
Name ID Mem VCPUs State Time(s)
Domain-0 0 3823 4 r----- 6139.4
os-centos3u6 1 1024 4 -b---- 16.3
os-centos4u4 2 1024 4 r----- 25609.6
os-centos5 3 1024 4 -b---- 33.9
os-debian40 4 1024 4 -b---- 17.9
os-sles11 5 1024 4 -b---- 21.7
os-suse111 7 1024 4 -b---- 22.3
os2-centos3u7 8 1024 4 -b---- 18.0
os2-centos4u4 9 1024 1 -b---- 527.0
os2-centos5 10 1024 4 -b---- 37.6
os2-debian40 11 1024 4 -b---- 18.1
os2-sles11 12 1024 4 -b---- 25.4
os2-suse111 13 1024 4 -b---- 24.7

from xend.log -- it shows that the immediate "restart" of os-suse111 as dom 7
(after dom 6 had crashed) finally worked:

[2010-04-28 11:25:41 4877] INFO (XendDomain:1175) Domain os-suse111 (6)
unpaused.
[2010-04-28 11:25:41 4877] WARNING (XendDomainInfo:1645) Domain has crashed:
name=os-suse111 id=6.
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:2446) XendDomainInfo.destroy:
domid=6
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:1971) Destroying device model
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:1978) Releasing devices
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:1991) Removing vif/0
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:921)
XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:1991) Removing console/0
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:921)
XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:1991) Removing vbd/768
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:921)
XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:1991) Removing vbd/832
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:921)
XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/832
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:1976) No device model
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:1978) Releasing devices
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:113)
XendDomainInfo.create_from_dict({'vcpus_params': {'cap': 0, 'weight': 256},
'PV_args': 'root=/dev/hda2', 'other_config': {}, 'features': '', 'cpus': [[],
[], [], []], 'paused': 0, 'domid': 6, 'vcpu_avail': 15, 'VCPUs_live': 1,
'PV_bootloader': '/usr/lib/xen/boot/domUloader.py', 'actions_after_crash':
'restart', 'vbd_refs': ['0b5b3f14-4392-f143-66de-32f7892e2987',
'2bb547ac-9e2d-9421-f4f9-a23a239eca23'], 'PV_ramdisk': '', 'is_control_domain':
False, '_temp_ramdisk': '/var/lib/xen/tmp/ramdisk.PwU6cM', 'name_label':
'os-suse111', 'VCPUs_at_startup': 1, 'HVM_boot_params': {}, 'platform': {},
'PV_kernel': '', 'console_refs': ['989fba9e-6019-0fb2-fb3a-d76928b5e02a'],
'online_vcpus': 1, 'vif_refs': ['ac0b86d0-b951-5aae-e3d3-6ac2cd5b7edc'],
'blocked': 0, 'on_xend_stop': 'ignore', 'shutdown': 0, 'HVM_boot_policy': '',
'shutdown_reason': 3, 'VCPUs_max': 4, 'start_time': 1272446739.491657,
'memory_static_max': 2147483648, 'actions_after_shutdown': 'destroy',
'on_xend_start': 'ignore', 'crashed': 0, 'memory_dynamic_max': 1073741824,
'actions_after_suspend': '', 'is_a_template': False, 'memory_dynamic_min':
1073741824, '_temp_args': 'root=/dev/hda2', 'cpu_time': 0.000237376,
'shadow_memory': 0, 'memory_static_min': 0, 'dying': 0, 'PV_bootloader_args':
'--entry=hda2:/boot/vmlinuz-xen,/boot/initrd-xen', 'notes': {'HV_START_LOW':
4118806528, 'FEATURES':
'writable_page_tables|writable_descriptor_tables|auto_translated_physmap|pae_pgdir_above_4gb|supervisor_mode_kernel',
'VIRT_BASE': 3221225472, 'GUEST_VERSION': '2.6', 'PADDR_OFFSET': 0, 'GUEST_OS':
'linux', 'HYPERCALL_PAGE': 3222278144, 'LOADER': 'generic', 'SUSPEND_CANCEL':
1, 'PAE_MODE': 'yes', 'ENTRY': 3222274048, 'XEN_VERSION': 'xen-3.0'},
'_temp_kernel': '/var/lib/xen/tmp/kernel.IsD6mr', 'uuid':
'21309d59-c939-48e6-e4b1-9c8b8dd9d0e2', 'actions_after_reboot': 'restart',
'_temp_using_bootloader': '1', 'target': 0, 'running': 0, 'vtpm_refs': [],
'devices': {'ac0b86d0-b951-5aae-e3d3-6ac2cd5b7edc': ('vif', {'bridge': 'br0',
'mac': '00:0c:29:a6:33:18', 'devid': 0, 'model': 'rtl8139', 'uuid':
'ac0b86d0-b951-5aae-e3d3-6ac2cd5b7edc'}),
'2bb547ac-9e2d-9421-f4f9-a23a239eca23': ('vbd', {'uuid':
'2bb547ac-9e2d-9421-f4f9-a23a239eca23', 'bootable': 0, 'devid': 832, 'driver':
'paravirtualised', 'dev': 'hdb', 'uname':
'file:/etc/xen/images/os-suse111_builddisk-flat.vmdk', 'mode': 'w'}),
'0b5b3f14-4392-f143-66de-32f7892e2987': ('vbd', {'uuid':
'0b5b3f14-4392-f143-66de-32f7892e2987', 'bootable': 1, 'devid': 768, 'driver':
'paravirtualised', 'dev': 'hda', 'uname':
'file:/etc/xen/images/os-suse111-flat.vmdk', 'mode': 'w'}),
'989fba9e-6019-0fb2-fb3a-d76928b5e02a': ('console', {'other_config': {},
'protocol': 'vt100', 'uuid': '989fba9e-6019-0fb2-fb3a-d76928b5e02a',
'location': '2'})}})
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:2068)
XendDomainInfo.constructDomain
[2010-04-28 11:25:41 4877] DEBUG (balloon:151) Balloon: 493372 KiB free; need
2048; done.
[2010-04-28 11:25:41 4877] DEBUG (XendDomain:450) Adding Domain: 7
[2010-04-28 11:25:41 4877] DEBUG (XendDomainInfo:2232)
XendDomainInfo.initDomain: 7 256
[2010-04-28 11:25:41 8291] DEBUG (XendBootloader:117) Launching bootloader as
['/usr/lib/xen/boot/domUloader.py', '--args=root=/dev/hda2',
'--output=/var/run/xend/boot/xenbl.24257',
'--entry=hda2:/boot/vmlinuz-xen,/boot/initrd-xen',
'/etc/xen/images/os-suse111-flat.vmdk'].
[2010-04-28 11:25:42 4877] DEBUG (XendDomainInfo:2262)
_initDomain:shadow_memory=0x0, memory_static_max=0x80000000,
memory_static_min=0x0.
[2010-04-28 11:25:42 4877] DEBUG (balloon:151) Balloon: 1061832 KiB free; need
1057280; done.
[2010-04-28 11:25:42 4877] INFO (image:166) buildDomain os=linux dom=7 vcpus=4
[2010-04-28 11:25:42 4877] DEBUG (image:642) domid = 7
[2010-04-28 11:25:42 4877] DEBUG (image:643) memsize = 1024
[2010-04-28 11:25:42 4877] DEBUG (image:644) image =
/var/lib/xen/tmp/kernel.iQDwuu
[2010-04-28 11:25:42 4877] DEBUG (image:645) store_evtchn = 1
[2010-04-28 11:25:42 4877] DEBUG (image:646) console_evtchn = 2
[2010-04-28 11:25:42 4877] DEBUG (image:647) cmdline = root=/dev/hda2


maybe this 1st crash for the 32bit 11.1 pvm client can be a hint for my problem
in bug #599789 starting exactly this pvm domU with 11.2/11.3 ?!?
I'll see once I can test 11.2/11.3 server again now with this console log (now
knowing that before I should have looked at least into "xm dmesg" ;-)


unfortuneately I'm now off for one week for a conference. likely I'll be
online sometimes and can give more data, but I won't be able (by policy -- not
for techinical reasons anymore thanks to IPMI;-)) to run any tests or reboot
while being "remote"...


feel free to set to "NEEDINFO" again -- I'll report any information about
crashes as soon as they are available (but for the next week everthing will run
as PVM so very likely it's all rock stable...)

--
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

< Previous Next >
References