[Bug 848754] New: Xen domU can't find VG after dom0 is rebooted
https://bugzilla.novell.com/show_bug.cgi?id=848754 https://bugzilla.novell.com/show_bug.cgi?id=848754#c0 Summary: Xen domU can't find VG after dom0 is rebooted Classification: openSUSE Product: openSUSE 12.3 Version: Final Platform: x86-64 OS/Version: openSUSE 12.3 Status: NEW Severity: Critical Priority: P5 - None Component: Xen AssignedTo: jdouglas@suse.com ReportedBy: eruby@knowledgematters.net QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0 I build a Xen server using OpenSUSE 12.3 with all of the latest patches. I can reboot the server and it reboots just fine, no problems. Then I create several OpenSUSE 12.3 VMs. I can reboot the VMs and they reboot just fine, no problems. However, once I reboot the dom0 Xen server the domU VMs will no longer boot. The console shows 'Volume group "whatever" not found' errors. dom0 has a single physical LUN from a SAN with an LVM installed. The LVM has the VG name "switch". We are using multipathd and there are four fiberchannel paths to each LUN. The domU's are also single physical LUN from the same SAN with an LVM installed. The LVMs have a VG name matching the host name, e.g. host www4 has VG "www4". Everything works fine as long as I do not reboot dom0. Once dom0 reboots, none of the domUs can find their volume groups. I can see the volume groups when I type "vgdisplay" or "vgs" in dom0, but they cannot be accessed in their domU VMs. I did verify that the blkbk module is loaded. I have rebuilt the OpenSUSE 12.3 Xen server 4 times now. I get the same results each time. I suspect any problem this serious in 12.3 Xen would already have an open ticket, so there's probably something in my setup that isn't correct, or it could be something about the combination of multipathd + LVM + Xen I'm using. If anyone has any suggestions of what I should look at or what else I should try I'm open to suggestions. Separate data point: I also tried shutting down a OpenSUSE 11.4 domU running on a OpenSUSE 11.4 Xen server, I changed it's LUN mapping on the SAN so the LUN was visible on the OpenSUSE 12.3 Xen server, then tried to boot it up on the OpenSUSE 12.3 Xen server. I got the same error on the console that it could not find its volume group. I shut it down, remapped the LUN back to the 11.4 Xen server and bought it back up on the 11.4 Xen server without a problem. Reproducible: Always Steps to Reproduce: 1. Build OpenSUSE 12.3 Xen server 2. Build OpenSUSE 12.3 VMs 3. Reboot OpenSUSE 12.3 Xen Server 4. OpenSUSE 12.3 VMs no longer boot Actual Results: Non-working VMs Expected Results: Working VMs The Xen console for www4 shows: Will boot selected entry in 2 seconds [ 0.172967] PCI: Fatal: No config space access function found [ 0.176373] Unable to read sysrq code in control/sysrq [ 1.365196] i8042: No controller found [ 1.366095] /home/abuild/rpmbuild/BUILD/kernel-xen-3.7.10/linux-3.7/drivers/rtc/hctosys.c: unable to open rtc device (rtc0) doing fast boot Creating device nodes with udev Volume group "www4" not found Volume group "www4" not found Volume group "www4" not found Volume group "www4" not found Volume group "www4" not found Trying manual resume from /dev/www4/swap0 Trying manual resume from /dev/www4/swap0 Waiting for device /dev/www4/root to appear: Reading all physical volumes. This may take a while... No volume groups found PARTIAL MODE. Incomplete logical volumes will be processed. Volume group "www4" not found PARTIAL MODE. Incomplete logical volumes will be processed. Volume group "www4" not found PARTIAL MODE. Incomplete logical volumes will be processed. Volume group "www4" not found PARTIAL MODE. Incomplete logical volumes will be processed. Volume group "www4" not found PARTIAL MODE. Incomplete logical volumes will be processed. Volume group "www4" not found PARTIAL MODE. Incomplete logical volumes will be processed. Volume group "www4" not found [this goes on for a long time.............] -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c2
Mike Latimer
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c3
--- Comment #3 from Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c4
--- Comment #4 from Mike Latimer
I'm pretty sure that supportconfig only ships in the supportutils package in SLES. I don't see it on my OpenSUSE 12.3 server.
Sorry about that. I didn't realize this was not included in 12.3. You can download the Factory version (1.20) at the following link: http://software.opensuse.org/package/supportutils It will install and run just fine under openSUSE.
Is there something I can run in OpenSUSE that will tell you what you want to know?
The problem is, there are quite a few things I would like to know. For example, here is the immediate list of things I would like: /etc/lvm/lvm.conf multipath -ll dmsetup ls --tree lscsi xml file for a guest with this problem I'm sure there are other questions the above output will raise as well, that is why the supportconfig output is so useful. If it is not possible to get that, we can try starting with the above. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c5
Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c6
Mike Latimer
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c7
--- Comment #7 from Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c8
--- Comment #8 from Mike Latimer
The domU LVs are *never* mounted in the dom0 environment and rarely migrated from one dom0 to another, so maybe that's why this has always worked OK.
Yes, if you are careful, you should be able to get away with this. It is usually safer to ensure dom0 never has access to those devices in the first place.
For dir0, right now /etc/xen/vm/dir0 is using:
/dev/disk/by-id/scsi-2000b0804e0002142
...which is symlinked to ../../dm-3. I changed /etc/xen/vm/dir0 to use:
/dev/disk/by-id/dm-uuid-mpath-2000b0804e0002142
... which is also symlinked to ../../dm-3.
Right now, your udev is synchronized with mpio, so both the scsi-XXX and dm-uuid-mpath-XXX devices should link to ../../dm-3. The problem is that the scsi-XXX devices start out being linked to ../../sdX, and are later remapped to ./../dm-3. If LVM activates before they are remapped, you can have a problem (i.e. LVM activated on a single path, rather than the MPIO device).
I still had the same issue when I tried to start dir0, so I added the domU /dev/disk/by-id/dm-uuid-mpath-* names to the filter list in dom0:
filter = [ "a|/dev/disk/by-id/dm-uuid-.*mpath-.*|", "r|/dev/disk/by-id/dm-uuid-part2-mpath-2000b0804e0002142|", "r|/dev/disk/by-id/dm-uuid-part2-mpath-2000b0804df002142|", "r|/dev/disk/by-id/dm-uuid-part2-mpath-2000b0804e0002142|", "r/.*/" ]
The problem with the above filter is that rules are processed in order. As the first accept rule will match all the dm-uuid-.*-mpath-.* devices, they will be accepted and never rejected. Changing that accept to after the reject rules will accomplish what you need here. BTW - I usually recommend people use MPIO aliases to name their LUNs, rather than leave that up to LVM. If you do that, you can create a more simple LVM filter that will reject all LUNs that start with 'vm-*' (as an example). Either way should work though. I am still not sure if the boot failure is due to LVM being activated on dom0 or not. For the time being, please try your test again after changing the filter. When the guest still fails to boot, are you left in a working maintenance shell? If so, I'd like to get the output of lsscsi, pvscan, and lvscan from that shell...
What additional output from supportconfig do you need? What flags?
Simply running supportconfig (with no parameters) will provide everything I need. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c9
--- Comment #9 from Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c10
--- Comment #10 from Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c11
--- Comment #11 from Mike Latimer
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c12
Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c13
--- Comment #13 from Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c14
Mike Latimer
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c15
Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c16
Mike Latimer
I added the features flag to /etc/multipath.conf:
devices { device { vendor "Pillar" product "Axiom.*" path_grouping_policy "group_by_prio" path_checker "tur" hardware_handler "0" prio "alua" rr_weight "uniform" features "1 no_partitions" } }
The problem is likely due to the wildcard in the product field. According to the MPIO developers, wildcards are not supported in those fields. Based on my own testing, using wildcards in those fields can work, but is also inconsistent. It depends on the default entries already in the internal tables. If changing that does not help, I should have some more advice with the following (complete) details: - /etc/multipath.conf - Output of lsscsi - Output of multipath -t - Output of multipath -v3 -d - Output of multipath -ll FYI - Most of the above is in a supportconfig output. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c17
--- Comment #17 from Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c18
--- Comment #18 from Mike Latimer
I got the whole "device { ... }" entry by running "multipath -t | less" and looking for "Pillar". There was only one entry, so I modified that. Many of the entries shipped with OpenSUSE include wildcards in the "product" field. Are they all wrong then?
No. The internal tables are allowed to use wildcard characters. It is only when adding an entry manually into /etc/multipath.conf that the wildcard characters may not work properly.
I figured out a series of commands to type last night and got it working. I posted a follow-up message with my notes but don't see it here. When I go home tonight I'll check my computer and see if they're still on the screen. If not I'll have to reconstruct what I did. (I'm still logged in at home, so all of the commands are on my screen there.)
After getting no_partitions to work, did the VMs start properly?
I'll also run supportconfig for you again and add a second supportconfig dump to this ticket.
Thanks. I'll watch for the rest of the details then. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c19
--- Comment #19 from Mike Latimer
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c20
Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c21
--- Comment #21 from Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c22
Mike Latimer
Why is it behaving differently at boot time?
The likely explanation is that you have multipath included in your initrd, so your changes in /etc/multipath.conf have not yet been added to the multipath.conf that is used from the initrd. You can add your updated multipath.conf to your initrd using `mkinitrd -f multipath`. (If your system partitions are not on a multipath device, the other option is to remove mpio from your initrd.) Regarding the multiple entries in /etc/multipath.conf, I would recommend that you determine which one is actually being used. You can do this by commenting one out, and looking at the output of `multipath -v3 -d`. If the 'no_partitions' feature is seen during the map assembly, you will know it read the entry correctly. Another way to investigate this is through `multipath -t`. If you look at the full table, you really only want one device entry to match your SAN vendor/product. When I have tested wilcards in my multipath.conf device entries, I usually see two such entries end up in the full table, then it is uncertain which one will be used.
I've used "multipath -r" before to update path PV information after enlarging a PV on the SAN, so I understand that, and -f will "flush all unused multipath device maps", but between the two of them they're resetting or fixing something that is broken after reboot.
I think the main thing that is happening here is that you are using the initrd multipath.conf during startup, which does not have the no_partitions feature, so the partition maps are being created. Later, after the server is up, you are flushing and rebuilding the maps using the /etc/multipath.conf that does have no_partitions, so the partition maps are being removed, and not recreated. Can you rebuild the initrd so the same multipath.conf is in both places and let me know if that helps? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c23
Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c24
--- Comment #24 from Mike Latimer
Oh, of course. I wasn't thinking about initrd.
So I rebuilt initrd. Then rebooted.
Of course now the "no_partitions" feature applies to all Pillar LUNs at boot time, including the dom0 boot LUN, so now it won't boot.
Sorry for the hassle. Can I please get that full supportconfig output? That would help me come up with a complete working configuration... However, I've made a few more guesses, and the rest of this comment should help.
At this point it booted but wouldn't recognize the "switch" VG. Probably something to do with the changes to the lvm.conf filter.
Yes, I'm sure the filter caused that. However, in an MPIO environment, it is incorrect (and potentially damaging) to use non-MPIO paths. (Those duplicate PV messages are a clear indication of the problem.) So, I would recommend sticking to the original filter I recommended in comment #6. I appreciate all the efforts, and I think we're close to a working solution. The key points are that we need a solution that does the following: multipath - no_partitions for everything except the 'switch' VG LUN LVM - An LVM filter which only activates on MPIO partitions Xen guest - A phy disk connection that utilizes the MPIO path The following settings should allow this: /etc/lvm/lvm.conf: filter = [ "a|/dev/disk/by-id/dm-uuid-.*mpath-.*|", "r/.*/" ] /etc/multipath.conf: defaults { # If all paths are down just queue up ios (not much else to do). no_path_retry queue } devices { device { vendor "Pillar" product "Axiom 500" path_grouping_policy "group_by_prio" path_checker "tur" hardware_handler "0" prio "alua" rr_weight "uniform" features "1 no_partitions" } } multipaths { # switch multipath { wwid 2000b080452002142 features "0" } } Sample xen guest configuration: disk = [ 'phy:/dev/disk/by-id/dm-uuid-mpath-2000b0804e1002142,xvda,w' ] In the above multipath.conf, you could also use your method from the previous comment, but as long as no_partitions takes effect from the device setting, it can be disabled on an individual LUN using my syntax above. With only a handful of LUNs either approach would work. As we already encountered, after making changes to lvm.conf or multipath.conf, rebuild the initrd so we have matching configurations in both places. The above configuration really should work. If you continue to run into problems, I will need either a supportconfig or remote access to track down the root cause. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c
Mike Latimer
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c25
Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c26
Mike Latimer
I changed the files the way you show them, ran mkinitrd, and rebooted. Everything works fine now. No duplicate PV messages when I run vgdisplay, only the "switch" volume appears in dom0, and all VMs boot just fine.
Excellent! I'm glad we got it all sorted out.
Any idea what prompted the Xen team to make this change? It certainly makes working with Xen a whole lot more difficult. If you hadn't helped me I probably would have switched over to KVM.
There was no change in Xen which caused this, as the real issue was MPIO creating a partition mapping, which was then activated by LVM. It is possible some timing or configuration setting in 11.4 prevented this issue from being seen, but the potential for this problem does exist in that code. Also, the issue could be seen under either Xen or KVM. Closing as INVALID as this is a configuration issue rather than a bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c27
--- Comment #27 from Earl Ruby
https://bugzilla.novell.com/show_bug.cgi?id=848754
https://bugzilla.novell.com/show_bug.cgi?id=848754#c28
--- Comment #28 from Mike Latimer
Ticket https://www.novell.com/support/kb/doc.php?id=7011590 makes a pretty clear case that this problem was introduced in between SLES 11 SP1 and SLES 11 SP2.
The no_partitions feature was added in June of 2008, and first seen in SLES10 SP2 and SLES11 SP1. The referenced TID fails to document the crashes and corruption that can be seen without this feature (in any version of SLES), nor does it document exactly when this feature was added to the code base. As no_partitions resolves the corruption issue, rolling back this change is not an option.
I don't know what the equivalent version numbers are in OpenSUSE, but we've been using Xen with MPIO and LVM-per-VM since OpenSUSE 10.2 and it was working fine until we started upgrading our Xen dom0 servers from OpenSUSE 11.4 to 12.3.
The idea that someone would need to make this many arcane configuration changes to get Xen working seems more like a bug than a configuration issue to me.
I haven't seen your earlier environment, but if the scsi-XXX devices were being served to the VM instead of the exclusive multipath devices, it is possible that you did not see the problem because you were bypassing the MPIO layer. This will work, but leaves the environment at risk of failure if the active path failed over. The same is true for your LVM configuration. If it was not using exclusively MPIO devices, it could have been activating on a single raw path, and problems would be encountered if that path failed. I also don't view this as an arcane configuration requirement. Once all layers are understood, the correct configuration looks something like: / \ /--> Xen (phy:/...dm-uuid-mpath-*) LUN < raw paths (sd*) >--> MPIO device < \ / (no_partitions) \--> LVM (filter = dm-uuid-mpath-*) What the above diagram implies, is that if the MPIO layer is not working, the Xen and LVM layers would be broken. In my opinion, this is a good thing, as it is a bigger problem if your environment works, but is really only using a single path behind the scenes. Such a configuration is broken, and can result in corruption. Your environment also has a couple other tricky elements - your LVM PVs are on a partition level, rather than the entire device, and you have one LUN which holds a system VG on a partition. Because of this, we want your system LUN to receive the partitions map, but the other LUNs require the no_partitions feature (for later use with Xen). With the basic multipath.conf configuration I provided in comment #24, this is accomplished. This configuration also has the side effect of making the LVM filter more generic. As the partition level maps are only being provided for the System LUN, the filter can safely scan all MPIO devices (partitions and entire LUNs), without activating on the LUNs used in Xen.
It would be nice if the installer either handled this for you or if someone rolled back the changes in SLES 11 SP2 that make these configuration changes necessary.
The system installer is much better at handling multipath in various configurations. However, given that you have two types of LUNs with different requirements (one with partitions, the rest without), I am not sure if there is any way this could be automatically determined. The only other way to workaround this issue would be to allow the partition maps to be created, and still be used by Xen. If this were the case, you'd still have to customize your LVM filter to prevent accidental activation of the VGs which belong to the guests. Because of these complications, it is better to understand all layers involved, and setup the environment according to whatever your specific requirements are. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com