[Bug 684561] New: Inbox Driver for IBM BladeCenter Qlogic 42C1830 CNA causes iSCSI session unexpected termination during straight IO with controller fail-over test.

https://bugzilla.novell.com/show_bug.cgi?id=684561 https://bugzilla.novell.com/show_bug.cgi?id=684561#c0 Summary: Inbox Driver for IBM BladeCenter Qlogic 42C1830 CNA causes iSCSI session unexpected termination during straight IO with controller fail-over test. Classification: openSUSE Product: openSUSE 11.2 Version: Final Platform: x86-64 OS/Version: SLES 11 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: chris.range@lsi.com QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16 It appears that a memory flood caused by a possible memory leak in the driver is causing the problem. The test runs just fine with Straight IO, however as soon as controller fail-over tests begin the IO fails after approximately 30 minutes. Below are the logs and memory output of the host when the error occurs. Logs are also attached. Mar 30 16:41:45 kswa-lebanon kernel: 504 [RAIDarray.mpp]cCfg62BR:0:0 Path Unfailed Mar 30 16:42:23 kswa-lebanon kernel: The following is only an harmless informational message. Mar 30 16:42:23 kswa-lebanon kernel: Unless you get a _continuous_flood_ of these messages it means Mar 30 16:42:23 kswa-lebanon kernel: everything is working fine. Allocations from irqs cannot be Mar 30 16:42:23 kswa-lebanon kernel: perfectly reliable and the kernel is designed to handle that. Mar 30 16:42:23 kswa-lebanon kernel: swapper: page allocation failure. order:0, mode:0x4120, alloc_flags:0x30 pflags:0x10200042 Mar 30 16:42:23 kswa-lebanon kernel: Call Trace: Mar 30 16:42:23 kswa-lebanon kernel: [c000000001fa7850] [c0000000000125dc] show_stack+0x6c/0x198 (unreliable) Mar 30 16:42:23 kswa-lebanon kernel: [c000000001fa7900] [c000000000137f78] __alloc_pages_nodemask+0x5a0/0x740 Mar 30 16:42:23 kswa-lebanon kernel: [c000000001fa7a60] [c0000000001713b8] alloc_pages_current+0x90/0x108 Mar 30 16:42:23 kswa-lebanon kernel: [c000000001fa7b00] [d000000002716054] ql_get_next_chunk+0x13c/0x320 [qlge] Mar 30 16:42:23 kswa-lebanon kernel: [c000000001fa7ba0] [d0000000027162c4] ql_update_lbq+0x8c/0x340 [qlge] Mar 30 16:42:23 kswa-lebanon kernel: [c000000001fa7c60] [d00000000271bb24] ql_clean_inbound_rx_ring+0x154/0x2c0 [qlge] Mar 30 16:42:23 kswa-lebanon kernel: [c000000001fa7d10] [d00000000271bdfc] ql_napi_poll_msix+0x16c/0x280 [qlge] Mar 30 16:42:23 kswa-lebanon kernel: [c000000001fa7dd0] [c0000000004e7dd8] net_rx_action+0x178/0x298 Mar 30 16:42:23 kswa-lebanon kernel: [c000000001fa7eb0] [c0000000000b64d4] __do_softirq+0x13c/0x240 Mar 30 16:42:23 kswa-lebanon kernel: [c000000001fa7f90] [c0000000000306f4] call_do_softirq+0x14/0x24 Mar 30 16:42:23 kswa-lebanon kernel: [c0000001df74b8a0] [c00000000000e940] do_softirq+0xe8/0x108 Mar 30 16:42:23 kswa-lebanon kernel: [c0000001df74b940] [c0000000000b619c] irq_exit+0xb4/0xe8 Mar 30 16:42:23 kswa-lebanon kernel: [c0000001df74b9c0] [c00000000000e3b4] do_IRQ+0x14c/0x208 Mar 30 16:42:23 kswa-lebanon kernel: [c0000001df74ba70] [c000000000004c98] hardware_interrupt_entry+0x18/0x1c Mar 30 16:42:23 kswa-lebanon kernel: --- Exception: 501 at pseries_dedicated_idle_sleep+0x110/0x218 Mar 30 16:42:23 kswa-lebanon kernel: LR = pseries_dedicated_idle_sleep+0x124/0x218 Mar 30 16:42:23 kswa-lebanon kernel: [c0000001df74bd60] [c000000000061d54] pseries_dedicated_idle_sleep+0x84/0x218 (unreliable) Mar 30 16:42:23 kswa-lebanon kernel: [c0000001df74be10] [c000000000014828] cpu_idle+0x168/0x1e0 Mar 30 16:42:23 kswa-lebanon kernel: [c0000001df74bec0] [c0000000005b6e04] start_secondary+0x394/0x3dc Mar 30 16:42:23 kswa-lebanon kernel: [c0000001df74bf90] [c0000000000082d4] start_secondary_prolog+0x10/0x14 Mar 30 16:42:23 kswa-lebanon kernel: Mem-Info: Mar 30 16:42:23 kswa-lebanon kernel: Node 1 DMA per-cpu: Mar 30 16:42:23 kswa-lebanon kernel: CPU 0: hi: 6, btch: 1 usd: 0 Mar 30 16:42:23 kswa-lebanon kernel: CPU 1: hi: 6, btch: 1 usd: 2 Mar 30 16:42:23 kswa-lebanon kernel: CPU 2: hi: 6, btch: 1 usd: 5 Mar 30 16:42:23 kswa-lebanon kernel: CPU 3: hi: 6, btch: 1 usd: 5 Mar 30 16:42:24 kswa-lebanon kernel: CPU 4: hi: 6, btch: 1 usd: 1 Mar 30 16:42:24 kswa-lebanon kernel: CPU 5: hi: 6, btch: 1 usd: 3 Mar 30 16:42:24 kswa-lebanon kernel: CPU 6: hi: 6, btch: 1 usd: 5 Mar 30 16:42:24 kswa-lebanon kernel: CPU 7: hi: 6, btch: 1 usd: 1 Mar 30 16:42:24 kswa-lebanon kernel: active_anon:5097 inactive_anon:736 isolated_anon:0 Mar 30 16:42:24 kswa-lebanon kernel: active_file:52746 inactive_file:52773 isolated_file:0 Mar 30 16:42:24 kswa-lebanon kernel: unevictable:159 dirty:805 writeback:8 unstable:0 Mar 30 16:42:24 kswa-lebanon kernel: free:262 slab_reclaimable:4251 slab_unreclaimable:1531 Mar 30 16:42:24 kswa-lebanon kernel: mapped:701 shmem:113 pagetables:209 bounce:0 Mar 30 16:42:24 kswa-lebanon kernel: Node 1 DMA free:16768kB min:11200kB low:13952kB high:16768kB active_anon:326208kB inactive_anon:47104kB active_file:3375744kB inactive_file:3377472kB unevictable:10176kB isolated(anon):0kB isolated(file):0kB present:7890304kB mlocked:10176kB dirty:51520kB writeback:512kB mapped:44864kB shmem:7232kB slab_reclaimable:272064kB slab_unreclaimable:97984kB kernel_stack:7680kB pagetables:13376kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? no Mar 30 16:42:24 kswa-lebanon kernel: lowmem_reserve[]: 0 0 0 Mar 30 16:42:24 kswa-lebanon kernel: Node 1 DMA: 26*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 1920kB Mar 30 16:42:24 kswa-lebanon kernel: 105739 total pagecache pages Mar 30 16:42:24 kswa-lebanon kernel: 0 pages in swap cache Mar 30 16:42:24 kswa-lebanon kernel: Swap cache stats: add 0, delete 0, find 0/0 Mar 30 16:42:24 kswa-lebanon kernel: Free swap = 2104384kB Mar 30 16:42:24 kswa-lebanon kernel: Total swap = 2104384kB Mar 30 16:42:24 kswa-lebanon kernel: 123392 pages RAM Mar 30 16:42:24 kswa-lebanon kernel: 4849 pages reserved Mar 30 16:42:24 kswa-lebanon kernel: 92806 pages shared Mar 30 16:42:24 kswa-lebanon kernel: 31208 pages non-shared Mar 30 16:42:24 kswa-lebanon kernel: qlge 0003:01:00.1: ql_get_next_chunk: page allocation failed. Mar 30 16:42:24 kswa-lebanon kernel: qlge 0003:01:00.1: ql_update_lbq: Could not get a page chunk. Here are the free –m outputs: After host reboot before test: total used free shared buffers cached Mem: 7408 829 6579 0 84 325 -/+ buffers/cache: 420 6988 Swap: 2055 0 2055 During test: total used free shared buffers cached Mem: 7408 7369 39 0 3675 3175 -/+ buffers/cache: 517 6890 Swap: 2055 0 2055 Reproducible: Always Steps to Reproduce: 1. Install OS and use inbox driver for hardware 2. Run straight IO to test connectivity (24 hour run passes) 3. Run controller fail-over script and error occurs after approximately 30 minutes. Actual Results: After controller fail-over test runs for approximately 30 minutes, the IO fails which is caused by an unexpectedly terminated ISCSI session. Expected Results: IO is expected to not be interrupted and ISCSI sessions are to remain connected. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com