[Bug 623286] New: Major issues with xen VMs since upgrade from 11.2 to 11.3 (network lost, tools unrealiable)
http://bugzilla.novell.com/show_bug.cgi?id=623286 http://bugzilla.novell.com/show_bug.cgi?id=623286#c0 Summary: Major issues with xen VMs since upgrade from 11.2 to 11.3 (network lost, tools unrealiable) Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: x86-64 OS/Version: Other Status: NEW Severity: Critical Priority: P5 - None Component: Xen AssignedTo: jdouglas@novell.com ReportedBy: romain.pelissier@gmail.com QAContact: qa@suse.de Found By: --- Blocker: --- Created an attachment (id=376631) --> (http://bugzilla.novell.com/attachment.cgi?id=376631) Xen various logs and configs User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6 I have upgraded from 11.2 to 11.3 using zypper dup. I have noticed that all my VMs after rebooting have lost their network My network config is: eth0 <- vlan2 <- br2 I host some ubuntu vm and a opensuse 11,2 vm - So first thing that I have noticed is that all the vms have lost their network card. - I start my vm using xm create <vm> and the prompt never comme back. I have to do a ctrl+c to get back to the prompt and do a xm unpause <vm> to get it started - Every vm start paused, don't know why. - to restore the network card of the vm I have played with: * virsh domxml-to-native xen-xm <vm>.xml > <vm> * xm delete <vm> * xm new <vm> After that I was able to configure network on the ubuntu vms - Strangly if I apply the same cookbook to opensuse vms, they crash at some point and I really don't know why - I have create a brand new vm (opensuse 11.3) and since it works better, virt-manager crash every time I try to open the console - xm console never bring back the console All opensuse 11.2 rpms seems to have been updated to opensuse 11.3 for xen All vms configured to use the hypervisor default nic seems problematic, things seems goes better with the realtek nic I really need some help. Reproducible: Always Steps to Reproduce: 1. update from 11.2 to 11.3 with xen installed and some vm created 2. try to use the tools (xm create, etc) 3. try configuring vms network -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c1
Charles Arnold
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c2
--- Comment #2 from Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c3
--- Comment #3 from Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c4
--- Comment #4 from Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c5
--- Comment #5 from Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c6
--- Comment #6 from Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c7
--- Comment #7 from Charles Arnold
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c8
--- Comment #8 from Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c9
Preston Millett
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c10
--- Comment #10 from Romain Pelissier
I tried reproducing the issues you are seeing in our lab. From what I can tell the ubuntu vms that I created seemed to lose a NIC after upgrading but they had 2 NICs but default after creating them using virt-manager in 11.2. I did not see any vms starting paused but on some it would appear that virt-manager is crashing after I start a vm. I also used some OpenSuse 11.2 guests and they seemed to have retained their NIC configuration through the upgrade. What type of Ubuntu guests were you running?
Hi, In fact now the issue is less on the VM side since I was able to have them running even if I have to reconfigure their network. The major issues are: - virt-manager crash so I can't really configure the VMs this way - xen tools does not seems to work as expected Still: - Possible issue on the way bridged network is managed - Possible issue with hypervisor default nic Do you want me to open different tickets for those troubles? Thanks -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c11
Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c12
James Fehlig
the '/sbin/udevadm info -e' command seems to always hang at this point:
... P: /devices/xen-backend/vif-1-0
Seems like more fallout from netback deadlock issue in 11.3 final. Can you try the 11.3 KOTD, which contains a fix for the deadlock? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c13
--- Comment #13 from James Fehlig
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c14
--- Comment #14 from Preston Millett
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c15
--- Comment #15 from Romain Pelissier
I downloaded the KOTD and it looks like it fixes the instabilities that I was seeing in virt-manager on our test machine in our lab.
Can you give me the procedure you have followed to download and use the KOTD stuff please? Thanks -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c16
--- Comment #16 from James Fehlig
Can you give me the procedure you have followed to download and use the KOTD stuff please?
Replace your kernel-xen* packages with the ones from 11.3 KOTD repo. - rpm -qa | grep kernel-xen - download replacement kernel-xen* packages from ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/ - rpm -Uvh kernel-xen*.rpm - reboot -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c17
Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c18
--- Comment #18 from Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c19
James Fehlig
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c20
--- Comment #20 from James Fehlig
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c21
--- Comment #21 from Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c22
--- Comment #22 from Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c23
--- Comment #23 from Romain Pelissier
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c24
Perry Dick
The summary of this bug and majority of comments are quite misleading given the problem Romain is now seeing in #18. I should have closed it and requested Romain to enter a new bug before reassigning, but I'll now leave this decision with the new owners :).
Why would this ticket be closed? The bug has not been fixed through a patch. Someone found an arcane workaround, but that hardly seems to resolve the issue for all users of 11.3. I am pretty new to opensuse, having started with 11.1. I do have to say that I find it strange that 11.1 was the last release where xen actually worked. 11.2 and 11.3 were both released with xen not working properly. I wonder if Novell doesn't want people to get this feature for free? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c25
James Fehlig
I am pretty new to opensuse, having started with 11.1. I do have to say that I find it strange that 11.1 was the last release where xen actually worked.
11.1 is the same code base as SLES11 - and is well tested since it is the enterprise product.
11.2 and 11.3 were both released with xen not working properly. I wonder if Novell doesn't want people to get this feature for free?
But fixes have been made available. There are a lot of clever folks using openSUSE Xen in all sorts of interesting deployments. openSUSE is a community project and as such the community is expected to help harden it. Testing and bug reporting *prior* to releases is much appreciated :-). Thanks! -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c26
--- Comment #26 from Perry Dick
(In reply to comment #24)
I am pretty new to opensuse, having started with 11.1. I do have to say that I find it strange that 11.1 was the last release where xen actually worked. 11.1 is the same code base as SLES11 - and is well tested since it is the enterprise product. 11.2 and 11.3 were both released with xen not working properly. I wonder if Novell doesn't want people to get this feature for free? But fixes have been made available. There are a lot of clever folks using openSUSE Xen in all sorts of interesting deployments. openSUSE is a community project and as such the community is expected to help harden it. Testing and bug reporting *prior* to releases is much appreciated :-). Thanks!
Well, I stand corrected. Go ahead and close the ticket even though it has not been resolved. I will move on to a differnet flavor that provides release canidates that work, or at least provides patches before closing an open bug. My bad. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=623286
http://bugzilla.novell.com/show_bug.cgi?id=623286#c27
--- Comment #27 from James Fehlig
Well, I stand corrected. Go ahead and close the ticket even though it has not been resolved.
I don't see that anyone internal is suggesting to close this bug. It has been reassigned to appropriate component for fixing. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623286
https://bugzilla.novell.com/show_bug.cgi?id=623286#c28
Nick Couchman
(In reply to comment #26)
Well, I stand corrected. Go ahead and close the ticket even though it has not been resolved.
I don't see that anyone internal is suggesting to close this bug. It has been reassigned to appropriate component for fixing.
So, I understand the point about this being a community project, and that you rely on the community to fix some of the bugs. I also realize that I'm not a huge contributor in terms of code (and you would probably thank me for that if you knew my low level of programming skills :-). That said, this last comment was over two weeks ago, openSuSE 11.3 has been out for five or six weeks, and the fix for this issue has been known/available in the KOTD and OBS Kernel repositories for three to four weeks. So, the question is: why hasn't this shown up in the form of a kernel update in the openSuSE 11.3 repositories?? I'm not trying to be impatient - I'm happy to use the OBS Kernel repo or KOTD, and I very much appreciate the folks who can and do donate their time and energy to developing openSuSE, but this is a pretty major issue in a release of a Linux distribution that is considered "stable." I understand your point about the community development of this project, but if you expect folks to use openSuSE as a distribution (and, consequently, move into SLES for Enterprise-level functionality), these issues need to be responded to quickly, and four weeks for a problem of this magnitude is anything but quick. Issues like this will reflect on the entire "SUSE" product line, and that includes the Enterprise products. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623286
https://bugzilla.novell.com/show_bug.cgi?id=623286#c29
--- Comment #29 from James Fehlig
why hasn't this shown up in the form of a kernel update in the openSuSE 11.3 repositories??
I see that a kernel update fore 11.3 has been started but it has not yet made it to QA.
these issues need to be responded to quickly, and four weeks for a problem of this magnitude is anything but quick.
But it was responded to quickly. The kernel issue was fixed as soon as it was identified - and the fix made available to users. Granted, it wasn't through official update channel, but that's the type of support you get with the enterprise products.
Issues like this will reflect on the entire "SUSE" product line, and that includes the Enterprise products.
While I understand your frustration, it's unfortunate that the support offered for a free product reflects on the support of a paid product. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623286
https://bugzilla.novell.com/show_bug.cgi?id=623286#c30
--- Comment #30 from Nick Couchman
I see that a kernel update fore 11.3 has been started but it has not yet made it to QA.
these issues need to be responded to quickly, and four weeks for a problem of this magnitude is anything but quick.
But it was responded to quickly. The kernel issue was fixed as soon as it was identified - and the fix made available to users. Granted, it wasn't through official update channel, but that's the type of support you get with the enterprise products.
Yes, but it took me two days of banging my head against a wall (that my company paid for, no less) before I decided that maybe I wasn't actually doing something wrong and it may be a bug. Then I posted to the Xen mailing list and someone was kind enough to respond and point me in the direction of this bug. Had the fix that was already out there actually made it to the official update channel, I wouldn't have encountered it at all. I'm not trying to be difficult - I do understand that these things take time, this one just seemed to take a lot longer than usual. That's why it's frustrating and surprising, because it doesn't usually take this long, so, when I run into issues, I'm not in the habit of assuming it's a bug - I usually blame myself! :-)
Issues like this will reflect on the entire "SUSE" product line, and that includes the Enterprise products.
While I understand your frustration, it's unfortunate that the support offered for a free product reflects on the support of a paid product.
Well, unfortunately, that's reality. It isn't so much for folks like me - I do understand that there are differences, that Novell is going to concentrate more effort on the product that brings in revenue, etc. But, if you're trying to build a strong customer and user base for either free or paid products (it's the gateway, right - start with the free, move to the enterprise), it's going to be hard to do when folks stumble across issues like this and don't see fixes appear very quickly. I'll bet that there were some number of folks out there - maybe not a ton of them, but certainly a few - that were trying openSuSE for the first time. They downloaded and installed openSuSE 11.3, started playing with it, and then ran into one of the issues caused by this bug. Perhaps they played with it for a couple of weeks, tried updating, etc., then dropped it and went and found another distribution that worked, or went back to one they had been using. That's not just losing a user of the free product, that's losing a potential customer of the enterprise product. Like it or not, that's how it works. I'm still a SuSE user (both free and paid versions) and have no intention on switching. I think it's a great distribution, and I appreciate all the hard work that goes into it! I'm just trying to point out the negative impact something like this can have on the product perception, and how frustrating it is for those of us who are not accustomed to these sorts of delays :-). Anyway, I think I've hijacked this bug thread enough. Getting back to the bug - it's still listed under the GNOME component - is this really correct, or perhaps it should say kernel, instead?? -Nick -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623286
https://bugzilla.novell.com/show_bug.cgi?id=623286#c31
James Fehlig
Getting back to the bug - it's still listed under the GNOME component - is this really correct, or perhaps it should say kernel, instead??
As I mentioned in #20, this bug is has become very confusing. There were all sorts of symptoms caused by the netback deadlock, which have been fixed. The remaining issue is described in #18. It's a problem in libgtk-vnc, hence reassignment to GNOME component. In hindsight, I should have asked reporter to open a new bug for the libgtk-vnc issue and closed this bug as a duplicate of bnc#618678 - which is where we initially discovered the netback deadlock. Romain, Are you still seeing the issue in #18? If so, could you open a new bug against GNOME? This bug is officially too overloaded :-). Thanks! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623286
https://bugzilla.novell.com/show_bug.cgi?id=623286#c32
--- Comment #32 from Romain Pelissier
(In reply to comment #30)
Getting back to the bug - it's still listed under the GNOME component - is this really correct, or perhaps it should say kernel, instead??
As I mentioned in #20, this bug is has become very confusing. There were all sorts of symptoms caused by the netback deadlock, which have been fixed. The remaining issue is described in #18. It's a problem in libgtk-vnc, hence reassignment to GNOME component. In hindsight, I should have asked reporter to open a new bug for the libgtk-vnc issue and closed this bug as a duplicate of bnc#618678 - which is where we initially discovered the netback deadlock.
Romain, Are you still seeing the issue in #18? If so, could you open a new bug against GNOME? This bug is officially too overloaded :-). Thanks!
Yea the issue is still there but I think that this particular issue could be closed and another one should be opened for GNOME team with this issue. Juste let me post tomorrow some additional infos about this and create another one before closing this bug report. Thanks -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=623286
https://bugzilla.novell.com/show_bug.cgi?id=623286#c33
Rupert Kolb
https://bugzilla.novell.com/show_bug.cgi?id=623286
https://bugzilla.novell.com/show_bug.cgi?id=623286#c34
James Fehlig
As I understand, there is no way out of the box to use a more recent openSUSE than a xen dom0 in combination with paravirtualized domUs. The last working version is 11.1.
11.3 works fine. It shipped with a bug in netback but a kernel update fixing the issue has been released. We are also in the process of QA'ing updated xen packages for 11.3. You can test them in your environment as well http://download.opensuse.org/repositories/Virtualization:/openSUSE11.3/openS...
Am I right? This means, we have to be stuck at 11.1 when using xen, or use an other distribution: RH, Ubuntu, ... ??? 11.2 or 11.3 are useless?
No, that's not right. 11.3 works fine with updated kernel-xen in dom0. Our planned xen update for 11.3 will fix additional bugs found since 11.3 shipped (and SLES11 SP1 since they have same xen versions), further improving it's stability. BTW, I am going to close this bug now since it has become so convoluted. Please, if anyone on this bug is having further issues with an *updated* 11.3 system, open a new bug report describing the issue and we will gladly take a look. This bug has been abused enough with all sorts of unrelated issues. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com