[Bug 618678] New: blkback thread hangs after unsuccessful xen domU start
http://bugzilla.novell.com/show_bug.cgi?id=618678 http://bugzilla.novell.com/show_bug.cgi?id=618678#c0 Summary: blkback thread hangs after unsuccessful xen domU start Classification: openSUSE Product: openSUSE 11.3 Version: Factory Platform: x86-64 OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: Xen AssignedTo: jdouglas@novell.com ReportedBy: koenig@linux.de QAContact: qa@suse.de Found By: --- Blocker: --- one of my xen domUs did not start up as expected, which uses 2 iscsi disks. now, trying to start again I get # xm cre -c os-centos4u4 Using config file "./os-centos4u4". Error: Device /dev/xvdp (51952, vbd) is already connected. those two disks still show up with "lsscsi" # lsscsi -t [0:0:0:0] disk sata: /dev/sda [1:0:0:0] disk sata: /dev/sdb [6:0:1:0] cd/dvd ata: /dev/sr0 [35:0:0:0] disk iqn.2010-04.de.science-computing:os-centos4u4-builddisk-flat.vmdk,t,0x1 /dev/sdt [36:0:0:0] disk iqn.2010-04.de.science-computing:os-centos4u4-flat.vmdk,t,0x1 /dev/sdu and there are two kernel threads for "domU id #13" which does not exist (highest domU id running is 10): root 12345 0.0 0.0 0 0 ? S 13:42 0:00 [blkback.13.hda] root 12346 0.0 0.0 0 0 ? S 13:42 0:00 [blkback.13.hdb] I don't see any mappings with dmsetup or losetup # dmsetup ls No devices found # losetup -a iscsi logout does not do anything, and login throws an "not found" error, but it's shown in the list of available disks: # /sbin/iscsiadm -m node -T iscsi:iqn.2010-04.de.science-computing:os-centos4u4-flat.vmdk --logout # /sbin/iscsiadm -m node -T iscsi:iqn.2010-04.de.science-computing:os-centos4u4-flat.vmdk --login iscsiadm: no records found! # /sbin/iscsiadm -m node | grep os-centos4u4 192.168.178.4:3260,1 iqn.2010-04.de.science-computing:os-centos4u4-flat.vmdk 192.168.178.4:3260,1 iqn.2010-04.de.science-computing:os-centos4u4-builddisk-flat.vmdk # after shuttig down *all* domUs things changed a bit, but still very bad: now "lsscsi" does not show any virtual/iscsi disks anymore, but still *all* blkback threads exist # ps uax | grep blk root 4289 0.0 0.1 117332 7268 ? Ssl 12:00 0:00 blktapctrl root 4818 0.0 0.0 0 0 ? S 12:00 0:00 [blkback.1.hda] root 4819 0.0 0.0 0 0 ? S 12:00 0:00 [blkback.1.hdb] root 5296 0.0 0.0 0 0 ? S 12:00 0:00 [blkback.3.hda] root 5736 0.0 0.0 0 0 ? S 12:01 0:00 [blkback.4.hda] root 5737 0.0 0.0 0 0 ? S 12:01 0:00 [blkback.4.hdb] root 6188 0.0 0.0 0 0 ? S 12:01 0:00 [blkback.5.hda] root 6189 0.0 0.0 0 0 ? S 12:01 0:00 [blkback.5.hdb] root 6666 0.0 0.0 0 0 ? S 12:01 0:00 [blkback.6.hda] root 6667 0.0 0.0 0 0 ? S 12:01 0:00 [blkback.6.hdb] root 7637 0.0 0.0 0 0 ? S 12:02 0:00 [blkback.8.hda] root 7638 0.0 0.0 0 0 ? S 12:02 0:00 [blkback.8.hdb] root 8151 0.0 0.0 0 0 ? S 12:03 0:00 [blkback.9.hda] root 8152 0.0 0.0 0 0 ? S 12:03 0:00 [blkback.9.hdb] root 8690 0.0 0.0 0 0 ? S 12:03 0:00 [blkback.10.hda] root 8691 0.0 0.0 0 0 ? S 12:03 0:00 [blkback.10.hdb] root 12345 0.0 0.0 0 0 ? S 13:42 0:00 [blkback.13.hda] root 12346 0.0 0.0 0 0 ? S 13:42 0:00 [blkback.13.hdb] now I'll reboot, but any suggestion how to correctly clean up such a mess next time, or which other information are important for further debugging ? how can I get rid of those blkback.* threads ? even after stopping iscsi, there are 17 blkback threads, and the iscsi_tcp kernel module has a usage cound of 17, so it's not possible to completely reload/restart without reboot: # ps uax | grep iscsi iscsi_tcp 11666 17 libiscsi_tcp 18437 1 iscsi_tcp libiscsi 50884 2 iscsi_tcp,libiscsi_tcp scsi_transport_iscsi 41815 2 iscsi_tcp,libiscsi scsi_mod 191208 7 iscsi_tcp,libiscsi,scsi_transport_iscsi,sr_mod,sg,sd_mod,libata some rpm versions: # rpm -qa xen kernel-xen \*iscsi\*| sort iscsitarget-1.4.19-2.31.x86_64 iscsitarget-kmp-default-1.4.19_k2.6.34.0_12-2.31.x86_64 iscsitarget-kmp-xen-1.4.19_k2.6.34.0_12-2.31.x86_64 kernel-xen-2.6.34-12.1.x86_64 open-iscsi-2.0.870-31.8.x86_64 xen-4.0.0_21091_05-6.3.x86_64 yast2-iscsi-client-2.19.5-1.4.noarch yast2-iscsi-server-2.19.0-1.5.noarch here are the kernel msgs from that last startup Jun 30 13:41:57 os4 kernel: [ 6131.884391] blkback: ring-ref 8, event-channel 80, protocol 1 (x86_64-abi) Jun 30 13:41:57 os4 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/0/51952 Jun 30 13:41:57 os4 logger: /etc/xen/scripts/block-iscsi: add XENBUS_PATH=backend/vbd/0/51952 Jun 30 13:41:57 os4 kernel: [ 6131.988562] blkback: ring-ref 8, event-channel 80, protocol 1 (x86_64-abi) Jun 30 13:41:57 os4 kernel: [ 6132.377246] scsi34 : iSCSI Initiator over TCP/IP Jun 30 13:41:57 os4 kernel: [ 6132.630238] scsi 34:0:0:0: Direct-Access IET VIRTUAL-DISK 0 PQ: 0 ANSI: 4 Jun 30 13:41:57 os4 kernel: [ 6132.630451] sd 34:0:0:0: Attached scsi generic sg3 type 0 Jun 30 13:41:57 os4 kernel: [ 6132.630867] sd 34:0:0:0: [sdt] 23068672 512-byte logical blocks: (11.8 GB/11.0 GiB) Jun 30 13:41:57 os4 kernel: [ 6132.631006] sd 34:0:0:0: [sdt] Write Protect is off Jun 30 13:41:57 os4 kernel: [ 6132.631010] sd 34:0:0:0: [sdt] Mode Sense: 77 00 00 08 Jun 30 13:41:57 os4 kernel: [ 6132.631596] sd 34:0:0:0: [sdt] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Jun 30 13:41:57 os4 kernel: [ 6132.632709] sdt: sdt1 sdt2 Jun 30 13:41:57 os4 kernel: [ 6132.642926] sd 34:0:0:0: [sdt] Attached SCSI disk Jun 30 13:41:58 os4 iscsid: connection27:0 is operational now Jun 30 13:42:01 os4 logger: /etc/xen/scripts/block-iscsi: Writing backend/vbd/0/51952/physical-device 41:30 to xenstore. Jun 30 13:42:01 os4 logger: /etc/xen/scripts/block-iscsi: Writing backend/vbd/0/51952/hotplug-status connected to xenstore. Jun 30 13:42:01 os4 kernel: [ 6136.654486] (cdrom_add_media_watch() file=/usr/src/packages/BUILD/kernel-xen-2.6.34/linux-2.6.34/drivers/xen/blkback/cdrom.c, line=108) nodename :backend/vbd/0/51952 Jun 30 13:42:01 os4 kernel: [ 6136.654491] (cdrom_is_type() file=/usr/src/packages/BUILD/kernel-xen-2.6.34/linux-2.6.34/drivers/xen/blkback/cdrom.c, line=95) type:0 Jun 30 13:42:01 os4 kernel: [ 6136.669605] blkfront: xvdp: barriers enabled Jun 30 13:42:01 os4 kernel: [ 6136.669934] xvdp: xvdp1 xvdp2 Jun 30 13:42:02 os4 kernel: [ 6137.665224] kjournald starting. Commit interval 15 seconds Jun 30 13:42:02 os4 kernel: [ 6137.665243] EXT3-fs (dm-0): mounted filesystem with ordered data mode Jun 30 13:42:03 os4 logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/0/51952 Jun 30 13:42:03 os4 logger: /etc/xen/scripts/block-iscsi: remove XENBUS_PATH=backend/vbd/0/51952 Jun 30 13:42:03 os4 kernel: [ 6138.703708] connection27:0: detected conn error (1020) Jun 30 13:42:04 os4 logger: /etc/xen/scripts/block: Writing backend/vbd/0/51952/hotplug-error /etc/xen/scripts/block failed; error detected. backend/vbd/0/51952/hotplug-status error to xenstore. Jun 30 13:42:04 os4 logger: /etc/xen/scripts/block: /etc/xen/scripts/block failed; error detected. Jun 30 13:42:04 os4 logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vbd/0/51952 Jun 30 13:42:08 os4 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/13/768 Jun 30 13:42:08 os4 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/13/832 Jun 30 13:42:08 os4 logger: /etc/xen/scripts/block-iscsi: add XENBUS_PATH=backend/vbd/13/768 Jun 30 13:42:08 os4 logger: /etc/xen/scripts/block-iscsi: add XENBUS_PATH=backend/vbd/13/832 Jun 30 13:42:08 os4 logger: /etc/xen/scripts/vif-bridge: online XENBUS_PATH=backend/vif/13/0 Jun 30 13:42:08 os4 kernel: [ 6143.055984] device vif13.0 entered promiscuous mode Jun 30 13:42:08 os4 logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge online for vif13.0, bridge br0. Jun 30 13:42:08 os4 kernel: [ 6143.060799] br0: port 11(vif13.0) entering forwarding state Jun 30 13:42:08 os4 logger: /etc/xen/scripts/vif-bridge: Writing backend/vif/13/0/hotplug-status connected to xenstore. Jun 30 13:42:08 os4 kernel: [ 6143.576594] scsi35 : iSCSI Initiator over TCP/IP Jun 30 13:42:08 os4 kernel: [ 6143.578691] scsi36 : iSCSI Initiator over TCP/IP Jun 30 13:42:09 os4 kernel: [ 6143.832288] scsi 35:0:0:0: Direct-Access IET VIRTUAL-DISK 0 PQ: 0 ANSI: 4 Jun 30 13:42:09 os4 kernel: [ 6143.832500] sd 35:0:0:0: Attached scsi generic sg3 type 0 Jun 30 13:42:09 os4 kernel: [ 6143.833215] scsi 36:0:0:0: Direct-Access IET VIRTUAL-DISK 0 PQ: 0 ANSI: 4 Jun 30 13:42:09 os4 kernel: [ 6143.834392] sd 36:0:0:0: Attached scsi generic sg4 type 0 Jun 30 13:42:09 os4 kernel: [ 6143.837179] sd 35:0:0:0: [sdt] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB) Jun 30 13:42:09 os4 kernel: [ 6143.837238] sd 35:0:0:0: [sdt] Write Protect is off Jun 30 13:42:09 os4 kernel: [ 6143.837240] sd 35:0:0:0: [sdt] Mode Sense: 77 00 00 08 Jun 30 13:42:09 os4 kernel: [ 6143.837344] sd 35:0:0:0: [sdt] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Jun 30 13:42:09 os4 kernel: [ 6143.841716] sd 36:0:0:0: [sdu] 23068672 512-byte logical blocks: (11.8 GB/11.0 GiB) Jun 30 13:42:09 os4 kernel: [ 6143.841800] sd 36:0:0:0: [sdu] Write Protect is off Jun 30 13:42:09 os4 kernel: [ 6143.841803] sd 36:0:0:0: [sdu] Mode Sense: 77 00 00 08 Jun 30 13:42:09 os4 kernel: [ 6143.841944] sd 36:0:0:0: [sdu] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Jun 30 13:42:09 os4 kernel: [ 6143.842426] sdu: sdu1 sdu2 Jun 30 13:42:09 os4 kernel: [ 6143.843052] sdt: Jun 30 13:42:09 os4 kernel: [ 6143.843582] sd 36:0:0:0: [sdu] Attached SCSI disk Jun 30 13:42:09 os4 kernel: [ 6143.850831] sdt1 Jun 30 13:42:09 os4 kernel: [ 6143.851797] sd 35:0:0:0: [sdt] Attached SCSI disk Jun 30 13:42:09 os4 iscsid: connection28:0 is operational now Jun 30 13:42:09 os4 iscsid: connection29:0 is operational now Jun 30 13:42:10 os4 avahi-daemon[3835]: Registering new address record for fe80::fcff:ffff:feff:ffff on vif13.0.*. Jun 30 13:42:13 os4 logger: /etc/xen/scripts/block-iscsi: Writing backend/vbd/13/832/physical-device 41:30 to xenstore. Jun 30 13:42:13 os4 logger: /etc/xen/scripts/block-iscsi: Writing backend/vbd/13/832/hotplug-status connected to xenstore. Jun 30 13:42:13 os4 kernel: [ 6147.857183] (cdrom_add_media_watch() file=/usr/src/packages/BUILD/kernel-xen-2.6.34/linux-2.6.34/drivers/xen/blkback/cdrom.c, line=108) nodename:backend/vbd/13/832 Jun 30 13:42:13 os4 kernel: [ 6147.857188] (cdrom_is_type() file=/usr/src/packages/BUILD/kernel-xen-2.6.34/linux-2.6.34/drivers/xen/blkback/cdrom.c, line=95) type:0 Jun 30 13:42:13 os4 logger: /etc/xen/scripts/block-iscsi: Writing backend/vbd/13/768/physical-device 41:40 to xenstore. Jun 30 13:42:13 os4 logger: /etc/xen/scripts/block-iscsi: Writing backend/vbd/13/768/hotplug-status connected to xenstore. Jun 30 13:42:13 os4 kernel: [ 6147.868988] (cdrom_add_media_watch() file=/usr/src/packages/BUILD/kernel-xen-2.6.34/linux-2.6.34/drivers/xen/blkback/cdrom.c, line=108) nodename:backend/vbd/13/768 Jun 30 13:42:13 os4 kernel: [ 6147.868994] (cdrom_is_type() file=/usr/src/packages/BUILD/kernel-xen-2.6.34/linux-2.6.34/drivers/xen/blkback/cdrom.c, line=95) type:0 Jun 30 13:42:16 os4 kernel: [ 6151.780136] blkback: ring-ref 8, event-channel 15, protocol 2 (x86_32-abi) Jun 30 13:42:17 os4 kernel: [ 6151.880532] alloc irq_desc for 902 on node 0 Jun 30 13:42:17 os4 kernel: [ 6151.880535] alloc kstat_irqs on node 0 Jun 30 13:42:17 os4 kernel: [ 6151.884791] blkback: ring-ref 9, event-channel 16, protocol 2 (x86_32-abi) Jun 30 13:42:17 os4 kernel: [ 6151.992027] alloc irq_desc for 903 on node 0 Jun 30 13:42:17 os4 kernel: [ 6151.992031] alloc kstat_irqs on node 0 Jun 30 13:42:17 os4 kernel: [ 6152.128027] alloc irq_desc for 904 on node 0 Jun 30 13:42:17 os4 kernel: [ 6152.128030] alloc kstat_irqs on node 0 Jun 30 13:42:19 os4 kernel: [ 6153.980512] vif13.0: no IPv6 routers present thanks for any idea! -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c
Jason Douglas
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c1
Jan Beulich
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c2
Harald Koenig
Getting xenstore's view (utility xenstore-ls) on both the corresponding frontends' and backends' states would possibly shed some light on this - the
I have one full xenstore dump which I took after all domUs have been shut down. only dom0 was still running, but xenstore showed all the machines which have been running before. one note about xenstore: I'm not using xenstore myself (no "xm new foo"), all domUs are started via config file either in /etc/init.d/xendomains or with "xm create foo", so I'm a bit surprised to find all the domU info in xenstore *after* all those domUs are shut down ?! I'll attach that dump..
driver terminates those threads when the devices get disconnected (hinting at some failure path [in tools or kernel] doing insufficient cleanup).
The kernel side code didn't change in quite a while - did you observe similar problems with earlier versions?
I'm afraid it'll be difficult for us to do anything if the issue doesn't re-occur for you.
it does reoccur 100% for me, right now xen 4.0 with 11.3 is more or less unusable: whenever the first domU gets shut down or crases it's not possible anymore to start any new domU -- so far I only can reboot the dom0 :-( now I did a new test: no iscsi but one single PVM domU being started with one disk image as local file. start and shutdown workds fine, and the "blkback" kernel thread vanished. but still it's not possible to start a new domU! with identical config file I get # time xm cre -c os-centos5 Using config file "./os-centos5". # Error: Device 0 (vif) could not be connected. Hotplug scripts not working. so I again commented out the "vif=..." def only to get this (both times after 100 secs timeout) # xm cre -c os-centos5 Using config file "./os-centos5". # Error: Device 768 (vbd) could not be connected. Hotplug scripts not working. so I have to reboot again... -- next/last "single domU" test will use a 64bit suse 11.1 domU instead of cenots, just in case you want to argue... -- so stay tuned;-) now I'm no longer sure if the remaining blkback threads are the real problem, or just one more symtom of a even bigger problem:-( -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c3
--- Comment #3 from Harald Koenig
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c4
--- Comment #4 from Harald Koenig
so I have to reboot again... -- next/last "single domU" test will use a 64bit suse 11.1 domU instead of cenots, just in case you want to argue... -- so stay tuned;-)
uh, oh! I've now tried to start/stop/start/stop other VMs and surprise: the problem does not get triggered by suse 11.1, sles 11 and centos 5 (both 32 and 64 bit PVMs). some of them have been changed to use disk images as local files, others still use iscsi disks (as I was lacy to change all configs;), so at least it's not an iscsi issue;) right now it looks like the problem is related to using vmlinuz-2.6.9-55.0.12.ELxenU or vmlinuz-2.6.9-89.0.23.ELxenU kernels form centos 4u5 and 4u8, again both 32 and 64 bit domU seem to trigger the problem! next update -- even more obfuscation:-( I replaced that vmlinuz-2.6.9* kernel in a centos4u4 domU by a 2.6.18-8.el5xen kernel (32bit for 1st test) from centos 5u0 (which I use successfully for a centos 5u0 domU), the initrd was directly copied from that working 5u0 domU. using that 5u0 kernel/initrd the 4u4 did not come up correctly, it crashed before it could access the rootfs (because the real 5u0 uses a plain rootfs partition while the 4u4 has rootfs in LVM -- console log of the crash below). BUT even if it only used the same kernel/initrd, this domU start again triggers that problem which blocks any further domU start:-( there is no additional output from "xm dmesg" when starting/stopping such a problematic domU, sono additional clue from that channel:-( right no I do not understand - why centos 5u0 works fine while booting 5u0 kernel with 4u4 kernel triggers further problems after crashing at end of initrd - how to get the centos4 machines working in 11.3/xen4 :-( just to state again: these PVMs run just fine with 11.1 (xen 3.3) and 11.2 (xen 3.4). are there known issues in xen 4.0 with older xen kernel versions in domU ? or any specific issues with centos 4u5/4u8 in xen4 ? this is the diff between the config files of centos 4u4 and 5u0: ------------------------------------------------------------------------------- < name="os-centos4u4" < bootargs="--entry=hda1:/boot/vmlinuz-2.6.18-8.el5xen,/boot/initrd-2.6.18-8.el5xen.img.c5" < disk=[ diskroot + 'os-centos4u4-flat.vmdk,hda,w' , diskroot + 'os-centos4u4-builddisk-flat.vmdk,hdb,w' ] < extra="root=/dev/mapper/VolGroup00-LogVol01" < vif=[ 'mac=00:0c:29:18:4b:5a,bridge=br0,model=rtl8139', ] -------------------------------------------------------------------------------
name="os-centos5" bootargs="--entry=hda1:/boot/vmlinuz-2.6.18-8.el5xen,/boot/initrd-2.6.18-8.el5xen.img" disk=[ diskroot + 'os-centos5-flat.vmdk,hda,w' ] extra="root=/dev/hda1" vif=[ 'mac=00:0c:29:4a:bf:fc,bridge=br0,model=rtl8139', ]
so there is no significant difference... "expected" (now that I see it;) panic when booting 4u4 disk with 5u0 kernel/initrd: Started domain os-centos4u4 (id=1) Linux version 2.6.18-8.el5xen (mockbuild@builder4.centos.org) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #1 SMP Thu Mar 15 21:02:53 EDT 2007 BIOS-provided physical RAM map: Xen: 0000000000000000 - 000000007d800000 (usable) 1280MB HIGHMEM available. 727MB LOWMEM available. NX (Execute Disable) protection: active ACPI in unprivileged domain disabled Built 1 zonelists. Total pages: 514048 Kernel command line: root=/dev/mapper/VolGroup00-LogVol01 [...] Creating root device. Mounting root filesystem. mount: could not find filesystem '/dev/root' Setting up other filesystems. Setting up new root fs setuproot: moving /dev failed: No such file or directory no fstab.sys, mounting internal defaults setuproot: error mounting /proc: No such file or directory setuproot: error mounting /sys: No such file or directory Switching to new root and running init. unmounting old /dev unmounting old /proc unmounting old /sys switchroot: mount failed: No such file or directory Kernel panic - not syncing: Attempted to kill init! Error: Domain 'os-centos4u4' does not exist. Error: Domain 'os-centos4u4' does not exist. and here is the full xen config file for the centos4u4 vm -- still hoping that there will be some bells ringing when you see some "unimportant" bits of information ;-) --- 8< ------ 8< ------ 8< ------ 8< ------ 8< ------ 8< ------ 8< ------ 8< --- name="os-centos4u4" memory=1000 maxmem=2000 vcpus=4 on_poweroff="destroy" on_reboot="restart" on_crash="restart" localtime=0 extra="root=/dev/mapper/VolGroup00-LogVol01" bootloader="/usr/lib/xen/boot/domUloader.py" builder="linux" #bootargs="--entry=hda1:/boot/vmlinuz-2.6.9-55.0.12.ELxenU,/boot/initrd-2.6.9-55.0.12.ELxenU.img" #bootargs="--entry=hda1:/boot/vmlinuz-2.6.9-89.0.23.ELxenU,/boot/initrd-2.6.9-89.0.23.ELxenU.img" #bootargs="--entry=hda1:/boot/vmlinuz-2.6.9-89.0.15.plus.c4xenU,/boot/initrd-2.6.9-89.0.15.plus.c4xenU.img" bootargs="--entry=hda1:/boot/vmlinuz-2.6.18-8.el5xen,/boot/initrd-2.6.18-8.el5xen.img.c5" diskroot='file:/etc/xen/images/' #iskroot='iscsi:iqn.2010-04.de.science-computing:' disk=[ diskroot + 'os-centos4u4-flat.vmdk,hda,w' , diskroot + 'os-centos4u4-builddisk-flat.vmdk,hdb,w' ] vif=[ 'mac=00:0c:29:18:4b:5a,bridge=br0,model=rtl8139', ] nographic=1 apic=1 acpi=1 pae=1 # serial="pty" --- 8< ------ 8< ------ 8< ------ 8< ------ 8< ------ 8< ------ 8< ------ 8< --- so, enough for today, I'm running out of ideas for now:-( -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c5
--- Comment #5 from James Fehlig
one note about xenstore: I'm not using xenstore myself (no "xm new foo"), all domUs are started via config file either in /etc/init.d/xendomains or with "xm create foo", so I'm a bit surprised to find all the domU info in xenstore *after* all those domUs are shut down ?!
I think you are confusing xenstore with xend's internal domU configuration database. The latter is where managed domU config is stored (/var/lib/xen/domains/<domU-uuid>/config.sxp), e.g. when doing 'xm new domU-config'. xenstore is where information is stored for _running_ domUs. Front and backend drivers rendezvous there, the various tools (xend, qemu-dm, etc.) read, write, and watch state in xenstore, etc. It is a database for active domUs. That said, all xenstore information pertaining to an active domU should be removed once the domU has powered down. Lots of conditions can cause orphaned entries in the store however. E.g. if blkbk does not fully cleanup the vbd, udev is never triggered, hotplug scripts aren't invoked, and xenstore entries are not removed. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c6
--- Comment #6 from James Fehlig
name="os-centos4u4"
memory=1000 maxmem=2000
Any luck if you set memory = maxmem? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c7
--- Comment #7 from Harald Koenig
(In reply to comment #4)
name="os-centos4u4"
memory=1000 maxmem=2000
Any luck if you set memory = maxmem?
I can try tomorrow, but almost all domUs are configured with the same memory setup (some old/small ones with memory=500, and the centos4u4 before was an exception with 3000/4000 for some larger build jobs, but reverting to 1000/2000 didn't help either... -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c8
James Fehlig
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c
Jan Beulich
From the xenstore dump I conclude that when you took it all backends were in 'closing' state. Unfortunately you didn't provide matched information about left over blkback threads: The driver switches a device to 'closing' only after having initiated termination of the respective thread, so (minus eventual bugs)
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c9
Jan Beulich
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c10
Harald Koenig
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c11
James Fehlig
can you reproduce that either (1) or (2) will lock you dom0 too ?
In each of these cases do you have a vif device that has not been released by netbk? I.e. do you have a /sys/devices/xen-backend/vif-X-Y? I think you are just hitting failure cases (different bugs) that cause netbk to not cleanup properly. Once netbk is in this state, further domUs cannot be started until a dom0 reboot. As mentioned in comment #8, I'm able to get netbk in this state by doing 'xm save' or 'xm destroy' on a domU. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c12
--- Comment #12 from James Fehlig
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c13
--- Comment #13 from James Fehlig
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c14
--- Comment #14 from Kattiganehalli srinivasan
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c15
--- Comment #15 from Jan Beulich
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c17
--- Comment #17 from Jan Beulich
Oh, one final note. netbk dumps a lot of the following messages once it gets in this state
Jul 2 11:51:01 xen48 kernel: [ 1417.987320] JWF (netbk_check_gop:529) Bad status -1 from copy to DOM2.
The vast majority of the paths where this error (GNTST_general_error) gets returned has an accompanying guest warning message - did you check whether you got any, so (assuming this has nothing to do with the problem explained earlier) we could get an understanding what's going on (of course, this status would get routinely returned for domains that are dying from the hypervisor's perspective, so this particular case would likely not need further investigation). -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c18
--- Comment #18 from James Fehlig
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c22
--- Comment #22 from James Fehlig
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c23
--- Comment #23 from Harald Koenig
Harold, it should fix your latest issues so we can return to the originally reported problem :-).
where/when can I et a rpm for testing ? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c24
--- Comment #24 from Jan Beulich
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c25
--- Comment #25 from Stephan Kulow
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c29
Harald Koenig
ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/ should have one pretty soon (within a day or so).
oops, forgot to give feedback, sorry! kernel-xen-2.6.34.1-0.0.6.82d7a13.x86_64 fixed the problem, all domUs now start and stop fine (and multiple times;). one drawback: there was no iscsitarget-kmp-xen rpm for that kernel, so my iscsi server on that dom0 broke "silently" :-( any ETA for a 11.3 update kernel with this issue fixed, pleeeease.... ? ;-) -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c30
James Fehlig
any ETA for a 11.3 update kernel with this issue fixed, pleeeease.... ? ;-)
Marcus, the xen kernel problem affecting SLE11 SP1 also affects 11.3 (actually the problem was found in 11.3 first). When can users expect an 11.3 kernel update? Thanks! -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c31
Marcus Meissner
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c33
James Fehlig
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c34
Harald Koenig
Harald,
Just to clarify, do you have any remaining issues after installing the fixed kernel?
no, I don't see any problems right now. that 11.3 dom0 server now runs for 12 days with 10 domUs (4 CPUs each and sometimes quite busy;) with no more issues so far. other than some missing *-kmp-* packages for that testing kernel (missing the typical suse/obs comfort;) which I had to build myself... there might one more iscsi raise condition problem in 11.3 (similar to #623470 for 11.2, but much more unlikely if still present at all) but I can't test this one with the production server anymore and my next test server isn't fully setup yet due to other tasks. if there are any iscsi/xen problems left, I'll open a new ticket... -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c35
James Fehlig
http://bugzilla.novell.com/show_bug.cgi?id=618678
http://bugzilla.novell.com/show_bug.cgi?id=618678#c36
James Fehlig
https://bugzilla.novell.com/show_bug.cgi?id=618678
https://bugzilla.novell.com/show_bug.cgi?id=618678#c37
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=618678
https://bugzilla.novell.com/show_bug.cgi?id=618678#c38
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=618678
https://bugzilla.novell.com/show_bug.cgi?id=618678#c
Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com