[Bug 495224] New: Imroper handling of multipath volumes?
http://bugzilla.novell.com/show_bug.cgi?id=495224 Summary: Imroper handling of multipath volumes? Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: All OS/Version: openSUSE 11.1 Status: NEW Severity: Major Priority: P5 - None Component: Basesystem AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: nice@titanic.nyme.hu QAContact: qa@suse.de Found By: --- Created an attachment (id=285908) --> (http://bugzilla.novell.com/attachment.cgi?id=285908) INformation about my disk system User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; hu-HU; rv:1.9.0.8) Gecko/2009032600 SUSE/3.0.8-1.1.1 Firefox/3.0.8 I have a relatively copmlex disk layout, with many multipathed volumes for xen domains (see an attachment later). openSUSE 10.3's bootloader stuffs were able to handle this situation, while SLES 10 SP2 was unable, and now openSUSE 11.1 is also unable. For example, when I start YaST's bootloader configuration tool, it hangs in the "Chencking boot loader phase..." When I run mkinitrd, it creates the ram disks well, but the hangs FOREVER, because a child process of it (update-bootloader) hangs with ethernal 100% CPU usage. It means that I cannot even run a kernel update, because that process will hang forever. The only workaround is to restart the machine with unplugged fibrechannel optics (using the local /dev/sda disk only, which contains the dom0) in order to be able to update the kernel or the bootloader configuration. All of my fibrechannel multipath volumes (serving as disks for the domUs) are partitioned, many of them contain bootable (active) partitions, and some of them contain even installed grub bootloaders. In addition to the fact that yast2 modules hang, our storage system (Sun StorageTek 6140) is sending error reports during yast2 is struggling. These messages mean that the server's multipath subsystem is unnecessarily changing the paths to reach the fibrachannel volumes. Moreover, we are always getting these error messages during server startups! My idea is, that the root of this problem is that some disk-related subsystems (maybe the yas2 partitioner, boot loader conf., mkinitrd, and something during startup) tries to access the /dev files representing the separate paths, which is an incorrect behaviour. For example, I observed that accessing a partition on a separate path will lead to an error message (the device is busy or exclusively used, or something like that). Reproducible: Always -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c1 --- Comment #1 from Tamás Németh <nice@titanic.nyme.hu> 2009-04-15 10:38:17 MDT --- Created an attachment (id=285912) --> (http://bugzilla.novell.com/attachment.cgi?id=285912) Latest yast2 logs An example message from our storage system: You requested the following events be forwarded to you from vw-storman Site : NYME Agent : vw-storman Severity : Major Category : 6140 Device Id : sun_storage Topic : hart Event Type : ProblemChangeEvent Event Code : 57.64.1010 Event Date : 2009-04-15 15:46:05 Description: The following volume(s) is/are not managed by their preferred controller hart openSUSE111-32 Probable Cause: One or more volumes are not currently being managed by their preferred controllers. Possible causes include: * The controller failed a manually initiated diagnostic test and was placed Offline. * The controller was manually placed Offline. * There are disconnected or faulty cables. * A Hub or Fabric switch is not functioning properly. * A host adapter has failed. * The storage array contains a defective RAID controller. Recommended Action: Check for and replace any failed controllers. Check for and manually place online any offline controllers. Check for faulty cables, replacing or reseating as needed. Check the SAN connection including cables, switches and HBAs. After correcting the problem, the volumes will need to be redistributed. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c2 --- Comment #2 from Tamás Németh <nice@titanic.nyme.hu> 2009-04-15 17:32:19 MDT --- An example of trying to access a separate path to a multipath volume: carrier4:~ # multipathd -k multipathd> show topology .. xentemp103-32 (3600a0b80004808d2000009884846b2cc) dm-1 SUN,CSM200_R [size=20G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=12][active] \_ 7:0:0:1 sdg 8:96 [active][ready] \_ 5:0:1:1 sdbq 68:64 [active][ready] \_ round-robin 0 [prio=2][enabled] \_ 5:0:0:1 sdc 8:32 [active][ghost] \_ 7:0:1:1 sdan 66:112 [active][ghost] .. carrier4:~ # mount -t ext3 /dev/sdg2 /mnt mount: /dev/sdg2 already mounted or /mnt busy carrier4:~ # mount -t ext3 /dev/sdbq2 /mnt mount: /dev/sdbq2 already mounted or /mnt busy carrier4:~ # mount -t ext3 /dev/sdc2 /mnt mount: /dev/sdc2 already mounted or /mnt busy carrier4:~ # mount -t ext3 /dev/sdan2 /mnt mount: /dev/sdan2 already mounted or /mnt busy carrier4:~ # mount -t ext3 /dev/mapper/xentemp103-32_part2 /mnt carrier4:~ # mount | grep mnt /dev/mapper/xentemp103-32_part2 on /mnt type ext3 (rw) -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c3 --- Comment #3 from Tamás Németh <nice@titanic.nyme.hu> 2009-04-16 04:12:29 MDT --- Beware this, too: https://bugzilla.novell.com/show_bug.cgi?id=468826 -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 Leon Wang <llwang@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |llwang@novell.com AssignedTo|bnc-team-screening@forge.pr |kernel-maintainers@forge.pr |ovo.novell.com |ovo.novell.com -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User jeffm@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c4 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High CC| |hare@novell.com AssignedTo|kernel-maintainers@forge.pr |jreidinger@novell.com |ovo.novell.com | --- Comment #4 from Jeff Mahoney <jeffm@novell.com> 2009-05-17 08:17:57 MDT --- It looks like the kernel is behaving properly, at least in the mount case, by denying access to the constituent devices. If yast2-bootloader and mkinitrd are both having problems enumerating devices, it could be a perl-Bootloader problem. I'm going to assign to the perl-Bootloader maintainer, but CC our multipath expert. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User jreidinger@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c5 Josef Reidinger <jreidinger@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |nice@titanic.nyme.hu --- Comment #5 from Josef Reidinger <jreidinger@novell.com> 2009-05-19 02:14:35 MDT --- Do you have enabled user-friendly names for multipath (from topology it looks that yes) so it is duplicate of bug 470109 . Please try disable user friendly names if this helps (fix waiting for maintenance update). http://bugzilla.novell.com/show_bug.cgi?id=470109 -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c6 Tamás Németh <nice@titanic.nyme.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|nice@titanic.nyme.hu | --- Comment #6 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-19 10:26:04 MDT --- (In reply to comment #5)
Do you have enabled user-friendly names for multipath (from topology it looks that yes) so it is duplicate of bug 470109 . Please try disable user friendly names if this helps (fix waiting for maintenance update).
I disabled friendly names. Mkinitrd succeeded, but it took impossibly long time (~5 minutes), yast bootloader took much more (I didn't have the time to wait for it, maybe hanged forever), but: -I still get the warning messages from our storage system during system reboot and, for example yast2 bootloader startup: https://bugzilla.novell.com/show_bug.cgi?id=493327#c7 -/etc/init.d/boot.multipath is still unable to remove multipaths, sometimes even the kernel does an immediate machine reset during attempts: https://bugzilla.novell.com/show_bug.cgi?id=468826 -Some /etc/init.d startup scripts start runing while the kernel is still collecting multipath data (see the image attached later). -YaST2 partition manager wants to manage separate paths to multipath devices, not only the combined devices (see the image attached later). This whole multipath systems seems to be weird to me. http://bugzilla.novell.com/show_bug.cgi?id=470109 -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c7 --- Comment #7 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-19 10:27:04 MDT --- Created an attachment (id=293082) --> (http://bugzilla.novell.com/attachment.cgi?id=293082) boot.jpg Some /etc/init.d startup scripts start runing while the kernel is still collecting multipath data -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c8 --- Comment #8 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-19 10:27:38 MDT --- Created an attachment (id=293083) --> (http://bugzilla.novell.com/attachment.cgi?id=293083) partitioning.jpg YaST2 partition manager wants to manage separate paths to multipath devices, not only the combined devices -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User jreidinger@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c9 Josef Reidinger <jreidinger@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |juhliarik@novell.com AssignedTo|jreidinger@novell.com |aschnell@novell.com --- Comment #9 from Josef Reidinger <jreidinger@novell.com> 2009-05-19 10:32:25 MDT --- OK, this this is another issues. perl-bootloader response only for mkinitrd hangup and is tracked in bug 4970109 . Assign to arvin who response for partitioner...other issues I am not sure who response for it and hope arvin know. juhliarek added for yast2 bootloader hang up. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User aschnell@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c10 Arvin Schnell <aschnell@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |nice@titanic.nyme.hu --- Comment #10 from Arvin Schnell <aschnell@novell.com> 2009-05-19 10:48:40 MDT --- What is wrong with the screenshot from comment #8? sda is not part of any multipath device so it's displayed with all it's partitions. sdaa is included in a multipath device and thus only the disk without partitions is shows. That is the expected way. By double-clicking on the disks and selecting the Overview you can check that sda is not used by any device and that sdaa is used by a multipath device. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c11 Tamás Németh <nice@titanic.nyme.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|nice@titanic.nyme.hu | --- Comment #11 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-21 08:30:48 MDT --- (In reply to comment #10)
What is wrong with the screenshot from comment #8?
sda is not part of any multipath device so it's displayed with all it's partitions. sdaa is included in a multipath device and thus only the disk without partitions is shows. That is the expected way. By double-clicking on the disks and selecting the Overview you can check that sda is not used by any device and that sdaa is used by a multipath device.
OK, my /etc/multipath.conf looks like this, now: defaults { user_friendly_names "no" } blacklist { devnode "^sda$" } However, LUN WWN / friendly name mappins are still present in /etc/multipath_bindings. Yesterday I wanted to start yast2/partitinor but it was unable to start in about twelve hours, so I restarted the whole machine, and today yast2/partitioner was finally able to start. If you take a look at the attached image, you can see, that three devices, which are otherwise paths to multipath volumes, can be actually managed! The three devices are: /dev/sdce; /dev/sdcf and /dev/sdcg. They are not shown in the multipath topology, instead 'multipath -d' displays this: carrier4:~ # multipath -d reload: 3600a0b80004808d200000b5548931bee n/a SUN,CSM200_R [size=600G][features=0][hwhandler=1 rdac][n/a] \_ round-robin 0 [prio=8][undef] \_ 7:0:0:9 sdx 65:112 [active][ghost] \_ 5:0:1:9 sdbg 67:160 [active][ghost] \_ round-robin 0 [prio=6][undef] \_ 5:0:0:9 sdr 65:16 [active][ready] \_ 7:0:1:9 sdce 69:32 [undef][ready] reload: 3600a0b80004808d200000b70489bf7f8 n/a SUN,CSM200_R [size=20G][features=0][hwhandler=1 rdac][n/a] \_ round-robin 0 [prio=12][undef] \_ 7:0:0:10 sdaa 65:160 [active][ready] \_ 5:0:1:10 sdbh 67:176 [active][ready] \_ round-robin 0 [prio=2][undef] \_ 5:0:0:10 sdt 65:48 [active][ghost] \_ 7:0:1:10 sdcf 69:48 [undef][ghost] reload: 3600a0b80004807f8000008cb489c01b3 n/a SUN,CSM200_R [size=300G][features=0][hwhandler=1 rdac][n/a] \_ round-robin 0 [prio=12][undef] \_ 5:0:0:11 sdu 65:64 [active][ready] \_ 7:0:1:11 sdcg 69:64 [undef][ready] \_ round-robin 0 [prio=2][undef] \_ 7:0:0:11 sdab 65:176 [active][ghost] \_ 5:0:1:11 sdbi 67:192 [active][ghost] These three multipats seem to be degraded somehow. We also use this fibrechannel storage system with Windows, without any problem. Multipath topology also shows these affected volumes to be in an unusual state (consisting of only three devices insted of four): create: 3600a0b80004808d200000b5548931bee dm-67 SUN,CSM200_R [size=600G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=8][active] \_ 7:0:0:9 sdx 65:112 [active][ghost] \_ 5:0:1:9 sdbg 67:160 [active][ghost] \_ round-robin 0 [prio=3][enabled] \_ 5:0:0:9 sdr 65:16 [active][ready] create: 3600a0b80004808d200000b70489bf7f8 dm-70 SUN,CSM200_R [size=20G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=12][active] \_ 7:0:0:10 sdaa 65:160 [active][ready] \_ 5:0:1:10 sdbh 67:176 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 5:0:0:10 sdt 65:48 [active][ghost] create: 3600a0b80004807f8000008cb489c01b3 dm-79 SUN,CSM200_R [size=300G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=6][enabled] \_ 5:0:0:11 sdu 65:64 [active][ready] \_ round-robin 0 [prio=2][enabled] \_ 7:0:0:11 sdab 65:176 [active][ghost] \_ 5:0:1:11 sdbi 67:192 [active][ghost] For comparison, here is a volume in normal state: 3600a0b80004808d200000b8948b281a6 dm-81 SUN,CSM200_R [size=20G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=12][active] \_ 7:0:0:12 sdae 65:224 [active][ready] \_ 5:0:1:12 sdbj 67:208 [active][ready] \_ round-robin 0 [prio=2][enabled] \_ 5:0:0:12 sdv 65:80 [active][ghost] \_ 7:0:1:12 sdch 69:80 [active][ghost] -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c12 --- Comment #12 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-21 08:32:58 MDT --- Created an attachment (id=293647) --> (http://bugzilla.novell.com/attachment.cgi?id=293647) yast2 partitioner -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User aschnell@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c13 Arvin Schnell <aschnell@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |aschnell@novell.com AssignedTo|aschnell@novell.com |hare@novell.com --- Comment #13 from Arvin Schnell <aschnell@novell.com> 2009-05-21 13:11:45 MDT --- So there's something wrong with multipath on the system. Assigning to maintainer or multipath-tools. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User hare@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c14 Hannes Reinecke <hare@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |nice@titanic.nyme.hu --- Comment #14 from Hannes Reinecke <hare@novell.com> 2009-05-22 00:49:26 MDT --- Please attach the output of 'lsscsi' and '/var/log/messages'. Plus you'll have to ensure that the 'host type' of the storage array is set to 'Linux'. Otherwise weird things will happen. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c15 --- Comment #15 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-22 01:13:48 MDT --- Created an attachment (id=293804) --> (http://bugzilla.novell.com/attachment.cgi?id=293804) Output of lsscsi -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c16 --- Comment #16 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-22 01:14:25 MDT --- Created an attachment (id=293805) --> (http://bugzilla.novell.com/attachment.cgi?id=293805) /var/log/messages -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c17 Tamás Németh <nice@titanic.nyme.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|nice@titanic.nyme.hu | --- Comment #17 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-22 01:21:17 MDT --- (In reply to comment #14)
Please attach the output of 'lsscsi' and '/var/log/messages'.
Plus you'll have to ensure that the 'host type' of the storage array is set to 'Linux'. Otherwise weird things will happen.
I've attached the files you asked. You seem to know this Sun StorageTek 6140 (rebranded LSI Engenio 3994) storage system (The system integrator who brought it to us had no real experts ;-). In fact I set the host type for every Linux host to be 'AIX failover' in 2008. My reason for doing this was, that I thought that the storage system won't let the host to change paths in order to achieve round robin functionality or failover. I thought the the host type 'Linux' is for host without multipath capability and with a single path to the storage system. What will change if I set the host types to Linux? Won't I experience performance degradation and failover incapability? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User hare@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c18 Hannes Reinecke <hare@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |nice@titanic.nyme.hu --- Comment #18 from Hannes Reinecke <hare@novell.com> 2009-05-22 03:10:59 MDT --- Quite so. Problem is indeed having a host type of 'AIX failover'. Various OSes expect different behaviour with regard to multipathing. 'AIX failover' as selected in this case would cause all _possible_ LUNs to appear in the linux kernel, hence the slightly weird 'lsscsi' output. The suggestion here is to set the host type to 'Linux' (if you are not using the 'rdac' hardware handler) or 'Solaris' (if you are using the 'rdac' hardware handler). Otherwise unexpected behaviour will occur as you've just seen. The 'Linux' host type is a failover mode with 'AVT' (automatic volume transfer) enabled, basically a multipath failover which is capable of switching paths automatically. However, this switch over takes some time so the system might become quite slow occasionally. For this reason I would suggest using 'Solaris' as host type, which is just like 'Linux' except that it doesn't use 'AVT' mode. So the storage array won't switch paths automatically and hence we won't induce any degradation. We do need the 'rdac' hardware handler here, though, as multipath has to send switch-over commands. Does that work better? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c19 --- Comment #19 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-22 03:23:32 MDT --- (In reply to comment #18)
The suggestion here is to set the host type to 'Linux' (if you are not using the 'rdac' hardware handler) or 'Solaris' (if you are using the 'rdac' hardware handler). Otherwise unexpected behaviour will occur as you've just seen.
For this reason I would suggest using 'Solaris' as host type
We do need the 'rdac' hardware handler here, though, as multipath has to send switch-over commands.
Which one of these do you suggest: Solaris (with Traffic Manager) Solaris (with Veritas DMP or other) -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User hare@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c20 --- Comment #20 from Hannes Reinecke <hare@novell.com> 2009-05-22 04:44:08 MDT --- Hmm. I knew this was going to be a problem. Maybe we should indeed set it to Linux. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c21 --- Comment #21 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-22 04:53:43 MDT --- Read this please: http://markmail.org/message/zxnzrrteon5slkla#query:storagetek%206140%20rdac%... It seems to me, that the hot type "Linux" is better for the RDAC mode. Until now, I used openSUSE 10.3 on xen.org 2.6.18 kernel, which doesn't support RDAC (AFAIK), so I chose "AIX failover" in orger to activate AVT. Should I switch to "Linux" with openSUSE 11.1 which seems to support RDAC? Anyway, I will try that NOW. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User hare@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c22 --- Comment #22 from Hannes Reinecke <hare@novell.com> 2009-05-22 05:06:35 MDT --- Yes, please always use hosttype 'Linux'. This actually enables 'AVT', so it's the preferred choice here. And miles better than AIX in any configuration. AIX has a really weird SCSI behaviour which should be avoided here. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c23 --- Comment #23 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-22 15:04:03 MDT --- (In reply to comment #22)
Yes, please always use hosttype 'Linux'. This actually enables 'AVT', so it's the preferred choice here. And miles better than AIX in any configuration. AIX has a really weird SCSI behaviour which should be avoided here.
So far, I was able to figure out that the 'Linux' host type disables the AVT mode, which means it expects the host to do explicit (e.g. RDAC, which seems to be supported by openSUSE 11.1's kernel) failover instead of implicit. 'Solaris (with Veritas DMP or other)' seems to be the right choice for implicit failover on Linux (older systems). Maybe I should try the instead of 'AIX failover', because when I set the host type to 'Linux', my hosts are almost always unable to finish the boot process! -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c24 Tamás Németh <nice@titanic.nyme.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|nice@titanic.nyme.hu | --- Comment #24 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-22 18:31:51 MDT --- Relevant material about the Linux vs. StorageTek 6140 / AVT vs. RDAC topics: http://forums.sun.com/thread.jspa?threadID=5296322 https://www.eng.utoledo.edu/ecc/secure-docs/user/sun_docs/6130/infodoc-83529... OpenSUSE 10.3 (which lacked the rdac hardware handler, at least with the xen.org 2.6.18 kernel) works with this storage (at least after modifying some sysV init scripts), but openSUSE 11.1 does not: Finally I set the host type to 'Linux' (non-AVT, RDAC supported), but the xen kernel always(?) fails to load the system. Is stalls at a certain point (see and attached image later). Default kernel also stalls sometimes (seen another attched image), but sometimes succeeds to load the system and then: -There are no storage system warning messages (I think it's because this RDAC thing informs the kernel which path to use). -All fibrechannell path devices are recognized, thus all multipath volumes are constructed correctly. -YaST2/partitioning works. -YaST2/bootloader module works. -mkinitrd finishes its job quickly. but: -/etc/init.d/boot.multipath is still unable to remove multipaths during shutdown (see the reopened https://bugzilla.novell.com/show_bug.cgi?id=468826 ) So, in order to be able to use Xen I have to revert to AVT instead of RDAC (I think the rdac subsytem in the kernel recognizes AVT mode, so no configuration change is necessary, since during booting I've seen messages like this: rdac: AVTmode detected) To sum up, although both AVT and RDAC mode seems to be broken with openSUSE 11.1 on this storage, RDAC behaves better if it accidentally happens to finish the boot process. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c25 --- Comment #25 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-22 18:32:58 MDT --- Created an attachment (id=294036) --> (http://bugzilla.novell.com/attachment.cgi?id=294036) xen boot process stalled -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c26 --- Comment #26 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-22 18:34:58 MDT --- Created an attachment (id=294037) --> (http://bugzilla.novell.com/attachment.cgi?id=294037) default kernel boot process stalled Don' forget that I use 64 bit openSUSE 11.1 -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c27 --- Comment #27 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-22 23:26:09 MDT --- Created an attachment (id=294044) --> (http://bugzilla.novell.com/attachment.cgi?id=294044) our fibrechannel topology You can see our SAN topology here: every host has two HBA cards, and every HBA card has two ports, but only one port per HBA is connected to the fibrechannel network. There are two fibrachannel switches, which are not conected to each other, so there are two distinct fibrechannel fabrics. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c28 --- Comment #28 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-23 01:27:46 MDT --- One more thing: I'm willing to give you an account to one of these servers, if you need. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c29 --- Comment #29 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-24 16:18:59 MDT --- Just one more info to help you to analyze the problem: The entire Linux system is on a local (non-multipath, of course) hardware RAID array, allt the fibrechannel multipth volumes are only for Xen domUs, which don't start automatically, yet despite this, problems with multipath volumes can keep the system from finishing the boot process. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User hare@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c30 Hannes Reinecke <hare@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |nice@titanic.nyme.hu --- Comment #30 from Hannes Reinecke <hare@novell.com> 2009-05-29 07:43:19 MDT --- Have you tried to press 'Control-C' when the boot process stalled? Occasionally I've been seeing this, but have been unable to track it down properly. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c31 Tamás Németh <nice@titanic.nyme.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|nice@titanic.nyme.hu | --- Comment #31 from Tamás Németh <nice@titanic.nyme.hu> 2009-05-31 16:13:51 MDT --- Created an attachment (id=295385) --> (http://bugzilla.novell.com/attachment.cgi?id=295385) After pressing ctrl-c (In reply to comment #30)
Have you tried to press 'Control-C' when the boot process stalled?
Yes, pressing ctrl-c lead to make the system to finish the boot process (unusually quickly), but it behaved like it was missing some of the filesystems, or something. I was even unable to log in, because it was complaining about some missing service component. Please take a look at the attached image. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User hare@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c32 Hannes Reinecke <hare@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |nice@titanic.nyme.hu --- Comment #32 from Hannes Reinecke <hare@novell.com> 2009-06-25 07:24:04 MDT --- Please look in /dev/.udev/failed. It looks as if some events couldn't be processed before the timeout. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c33 Tamás Németh <nice@titanic.nyme.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|nice@titanic.nyme.hu | --- Comment #33 from Tamás Németh <nice@titanic.nyme.hu> 2009-07-03 05:07:39 MDT --- (In reply to comment #32)
Please look in /dev/.udev/failed. It looks as if some events couldn't be processed before the timeout.
Sorry, I'm not an expert of udev, so, I don't know what /dev/.udev/failed is for. All I can see there is a broken symbolic link: carrier4:/dev/.udev/failed # ls -l /dev/.udev/failed total 0 lrwxrwxrwx 1 root root 46 2009-07-03 13:01 \x2fdevices\x2fplatform\x2fmicrocode\x2ffirmware\x2fmicrocode -> /devices/platform/microcode/firmware/microcode -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User hare@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c34 Hannes Reinecke <hare@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |nice@titanic.nyme.hu --- Comment #34 from Hannes Reinecke <hare@novell.com> 2009-07-03 05:24:20 MDT --- .. And which is exactly what I was looking for. So there was an microcode uevent which couldn't be handled. So please open another bug for this issue; it's totally unrelated to multipathing. Can we close this one then? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c35 Tamás Németh <nice@titanic.nyme.hu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|nice@titanic.nyme.hu | --- Comment #35 from Tamás Németh <nice@titanic.nyme.hu> 2009-07-03 05:47:53 MDT --- (In reply to comment #34)
... And which is exactly what I was looking for. So there was an microcode uevent which couldn't be handled.
This is why multipath volumes are handled improperly? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User hare@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c36 Hannes Reinecke <hare@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #36 from Hannes Reinecke <hare@novell.com> 2009-07-03 05:57:03 MDT --- Not, but it's the reason for
Finally I set the host type to 'Linux' (non-AVT, RDAC supported), but the xen kernel always(?) fails to load the system. Is stalls at a certain point (see and attached image later). Default kernel also stalls sometimes (seen another attched image), but sometimes succeeds to load the system and then:
And this doesn't have anything to do with multipathing. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c37 --- Comment #37 from Tamás Németh <nice@titanic.nyme.hu> 2009-07-03 06:07:51 MDT --- (In reply to comment #36)
Not, but it's the reason for
Finally I set the host type to 'Linux' (non-AVT, RDAC supported), but the xen kernel always(?) fails to load the system. Is stalls at a certain point (see and attached image later). Default kernel also stalls sometimes (seen another attched image), but sometimes succeeds to load the system and then:
And this doesn't have anything to do with multipathing.
Do you mean that the CPU microcode is somehow influenced by the configuration of the external (fibrechannel) storage system? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User hare@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c38 --- Comment #38 from Hannes Reinecke <hare@novell.com> 2009-07-03 07:09:10 MDT --- No, I mean the issue of a system stalling during boot and any fibrechannel configuration problems are totally unrelated. Hence the former warrants a new bugzilla entry. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=495224 User nice@titanic.nyme.hu added comment http://bugzilla.novell.com/show_bug.cgi?id=495224#c39 --- Comment #39 from Tamás Németh <nice@titanic.nyme.hu> 2009-07-03 07:16:38 MDT --- (In reply to comment #38)
No, I mean the issue of a system stalling during boot and any fibrechannel configuration problems are totally unrelated. Hence the former warrants a new bugzilla entry.
OK, I understand, but as you can read, the total boot stalling only happens with a certain (external) storage setting (and with the xen kernel). With a different storage setting even the xen kernel is able to boot the system. My problem is that multipath handling is unreliable, as described in this report and in others also submitted by me. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=495224 https://bugzilla.novell.com/show_bug.cgi?id=495224#c40 Hannes Reinecke <hare@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED Target Milestone|--- |Final --- Comment #40 from Hannes Reinecke <hare@novell.com> 2011-03-11 08:46:17 UTC --- Closing. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com