[Bug 818064] New: [NTAP-394811] SLES 11.3 x64 SANboot host hangs during stress IO with LPe16002 HBA and DMMP failover
https://bugzilla.novell.com/show_bug.cgi?id=818064 https://bugzilla.novell.com/show_bug.cgi?id=818064#c0 Summary: [NTAP-394811] SLES 11.3 x64 SANboot host hangs during stress IO with LPe16002 HBA and DMMP failover Classification: openSUSE Product: openSUSE 11.4 Version: RC 1 Platform: x86-64 OS/Version: SLES 11 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: luis.salmeron@netapp.com QAContact: qa-bugs@suse.de Found By: --- Blocker: --- Created an attachment (id=537592) --> (http://bugzilla.novell.com/attachment.cgi?id=537592) Host console logs User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C) A SLES 11 SP3 RC1 x64 host installed to a SAN volume will become unresponsive after 2-3 hours of failover tesing. The error is seen when failover is caused by continuous resets of the storage array controllers, failing of switch ports, or offlining of controllers. The message "Kernel panic - not syncing: Out of memory and no killable processes..." is seen at the time of the error. The volume storing the OS had a 100GB capacity and the server had 8GB of RAM. The host bus adapter used on this server is a 16GB Exulex LPe16002. The lpfc driver being used is the inbox driver version 8.3.7.10.2p. The firmware and BIOS for the HBA are 1.1.35.0 and KA6.01a12 respectively. The inbox Device Mapper Multipath is being used which is version 0.4.9-0.77.5. The only way to release the host from its unresponsive state is to power cycle the server. The servers that have seen this problem were a Dell PowerEdge R710 and a Dell PowerEdge R720. The same OS/array scenario has been attempted with Qlogic QLe2672 HBAs, Brocade 1860 HBAs, and Emulex LPe12002 HBAs and the error was not reproduced. Reproducible: Always Steps to Reproduce: 1. Install SLES 11 SP3 RC1 x64 on a volume on a storage array through an LPe16002 HBA. During the installation, the drive should be run as a multipath device. 2. Once installed, enable DMMP and map another 64 1GB volumes from two NetApp storage arrays 3. Create ext3 filesytems on 2 of the volumes and mount them to unused directories on the OS. Run filesystem IO to these 2 targets and raw IO to the remaining 62 volumes 4. Reset or offline a controller from one of the storage arrays, wait 10 minutes, reset another controller from the other storage array or bring the offlined controller back online, wait 10 minutes, repeat these steps while always alternating arrays and controller paths. Actual Results: After 2-3 hours of stressed IO, the system would become unresponsive and could not be brought back up unless it was power cycled Expected Results: The system should have ran IO for at least 16 hours without errors while the storage arrays continued their resets or offline/online tests. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=818064 https://bugzilla.novell.com/show_bug.cgi?id=818064#c1 --- Comment #1 from Luis Salmeron <luis.salmeron@netapp.com> 2013-05-01 21:42:29 UTC --- Created an attachment (id=537594) --> (http://bugzilla.novell.com/attachment.cgi?id=537594) syslogs -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com