https://bugzilla.novell.com/show_bug.cgi?id=643108 https://bugzilla.novell.com/show_bug.cgi?id=643108#c0 Summary: multipathd crashes quite often in SLES11 SP1 Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: x86-64 OS/Version: SLES 11 Status: NEW Severity: Critical Priority: P5 - None Component: Basesystem AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: nice@titanic.nyme.hu QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; hu-HU; rv:1.9.2.10) Gecko/20100914 SUSE/3.6.10-0.3.1 Firefox/3.6.10 Please note that this is a SLES11 SP1 bug, and change the product is possible. I run SLES11 SP1's x64_64 version on Xen. It servers numerous virtual servers (all af them are some SuSE systems). Dom0 server run on two different hardware: Sun Blade X6250 and HP ProLiant DL180 G6. FibreChannel FBA cards are of different type (even different driver). The Storage system is a Sun StorageTek 6140 array with two controllers. Sun blades have two FC connections and therefore 4 paths to each volume, while the HP server has one connection and 2 paths to each volume. Since upgrading to SLES11 SP1 to plain SLES11, I noticed that multipatd crashes quite often(!). This usually happens when I change the volume configuration on the storage array and then run rescan-scsi-bus.sh but I evenobserved multipathd crash when just running multipathd -k and listing topology. Multipath crash is sometimes a catastrophe because when the storage array migrates some of its volumes from on controller to another and the servers can't follow it then some volumes become impossibly slow (or unreachable?), and filesystem damage threatens all my virtual servers. Surprisingly I only observer one filesystem corruption so far, despite the fact that I have more than a hundred filesystems. My suspicion is that the multipathd crash may not be the only multipath problem, since on several occasions, restarting multipathd didn't help at all. Multipath volumes became totally unreachable, I wasn't able to shut down processes accessing them, so I had to reboot the dom0 entirely (including halting all the domUs). This really is a catastrophe, and there is possibly an additional kernel bug behind it. I attach a multipathd core file to this report and my multipath configfile. Reproducible: Sometimes -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.