[Bug 714744] New: [NetApp-LSIP200206375] SLES11SP1 LifeKeeper Node with Dell SAS HBA went into Kernal Panic following CFW upgrade

29 Aug 2011

      https://bugzilla.novell.com/show_bug.cgi?id=714744

https://bugzilla.novell.com/show_bug.cgi?id=714744#c0

           Summary: [NetApp-LSIP200206375] SLES11SP1 LifeKeeper Node with
                    Dell SAS HBA went into Kernal Panic following CFW
                    upgrade
    Classification: openSUSE
           Product: openSUSE 11.2
           Version: Final
          Platform: x86-64
        OS/Version: SLES 11
            Status: NEW
          Severity: Critical
          Priority: P5 - None
         Component: Kernel
        AssignedTo: kernel-maintainers@forge.provo.novell.com
        ReportedBy: luis.salmeron@netapp.com
         QAContact: qa@suse.de
          Found By: ---
           Blocker: ---

User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729;
NET4.0C; InfoPath.2; MS-RTC LM 8)

A LifeKeeper 7.3.0 node running SLES11 SP1 went into a kernel panic during the
execution of the cluster controller firmware upgrade using Array Management
Software. The test itself was run using a Dell SAS 7e HBA with the driver
version 04.100.01.03 and FW version 2.15.59 on a Dell server with shared 12
LUNS mapped to the host group. The Dell CFW was used and the inbox DMMP was
used for failover (version 1.02.27-8.17.20). The kernel version is
2.6.32.12-0.7-default.

During the test, the Controllers were successfully upgraded to the new CFW.
However, once the controllers had finished rebooting, the cluster node (Maul)
immediately failed all its resources to the other node in the setup (vader) and
entered into a kernel panic. During this time, error messages for reservation
conflicts between the kernel and the array appear in the host logs. Once the
kernel dump was completed, the host rebooted, successfully booted up
LifeKeeper, and was able to pick up its resources again on its own.

The kernel dump and host logs are located at ftp.netapp.com.
Login: anonymous
Password: your_email_address
The files will be located at pub/esg/LSIP200206375/

Reproducible: Always

Steps to Reproduce:
1. Connect two SLES 11 SP1 with Dell 7e SAS HBAs to an array and map 12 shared
volumes to the hosts.
2. Create a LifeKeeper cluster
3. Run I/O on a separate server using NFS
4. Run a script to upgrade the controller firmware on the arrays and then
sysReboot the controllers one at a time. Then continue to run I/O for another
10 minutes and then repeat the process for 12 hours but alternate between
upgrades and downgrades
Actual Results:  
After the first controller comes up from the first sysReboot and the second
controller sysReboots, one of the cluster nodes will do a kernel dump and
failover all of its resources to the other node.

Expected Results:  
The cluster nodes and the cluster resources should have stayed in an optimal
state throughout the uprades and downgrades.

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

bugzilla_noreply＠novell.com

tags

participants (1)