[Bug 426594] New: dapltest segfaults on QS22 hardware
https://bugzilla.novell.com/show_bug.cgi?id=426594 User bugproxy@us.ibm.com added comment https://bugzilla.novell.com/show_bug.cgi?id=426594#c1 Summary: dapltest segfaults on QS22 hardware Product: openSUSE 11.0 Version: Final Platform: Other OS/Version: All Status: NEW Severity: Normal Priority: P5 - None Component: Other AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: bugproxy@us.ibm.com QAContact: qa@suse.de Found By: Third Party Developer/Partner Partner ID: LTC 47535 ---Problem Description--- dapltest fails on a QS22 hardware Contact Information = sijo.george@gmail.com ---uname output--- Linux qs22-3 2.6.25.5-1.1-ppc64 #1 SMP 2008-06-07 01:55:22 +0200 ppc64 ppc64 ppc64 GNU/Linux Machine Type = CellBE ---Debugger--- A debugger is not configured ---Steps to Reproduce--- 1> Setup the infiniband interface on the machine 2> Insert the required modules ============== qs22-3:~ # lsmod Module Size Used by mlx4_ib 73324 0 mlx4_core 115560 1 mlx4_ib ib_iser 66112 0 libiscsi 57360 1 ib_iser scsi_transport_iscsi 69368 2 ib_iser,libiscsi rdma_ucm 38248 0 rdma_cm 61380 2 ib_iser,rdma_ucm iw_cm 31776 1 rdma_cm joydev 35680 0 ib_umad 41648 0 ib_uverbs 67540 1 rdma_ucm ib_ipoib 122400 0 ib_cm 69140 2 rdma_cm,ib_ipoib ib_sa 51656 3 rdma_cm,ib_ipoib,ib_cm ib_addr 28432 1 rdma_cm ip6t_LOG 25340 8 xt_tcpudp 21416 6 xt_pkttype 19424 3 ipt_LOG 25180 9 xt_limit 21596 17 binfmt_misc 34948 1 uinput 30704 2 nf_conntrack_ipv6 42952 4 ip6t_REJECT 24248 3 ipt_REJECT 22432 3 xt_state 20384 8 iptable_mangle 22992 0 iptable_nat 29364 0 nf_nat 45666 1 iptable_nat iptable_filter 22872 1 ip6table_mangle 23112 0 nf_conntrack_netbios_ns 20704 0 nf_conntrack_ipv4 34184 7 iptable_nat,nf_nat nf_conntrack 111352 6 nf_conntrack_ipv6,xt_state,iptable_nat,nf_nat,nf_conntrack_netbios_ns,nf_conntrack_ipv4 ip_tables 46304 3 iptable_mangle,iptable_nat,iptable_filter ip6table_filter 22736 1 ip6_tables 47760 3 ip6t_LOG,ip6table_mangle,ip6table_filter x_tables 51256 11 ip6t_LOG,xt_tcpudp,xt_pkttype,ipt_LOG,xt_limit,ip6t_REJECT,ipt_REJECT,xt_state,iptable_nat,ip_tables,ip6_tables ipv6 438504 26 ib_ipoib,nf_conntrack_ipv6,ip6t_REJECT,ip6table_mangle fuse 92504 1 loop 44740 0 dm_mod 114744 0 sg 71112 0 pmi 26856 0 sd_mod 55064 0 ib_mthca 197224 0 ib_mad 72720 5 mlx4_ib,ib_umad,ib_cm,ib_sa,ib_mthca ib_core 96880 12 mlx4_ib,ib_iser,rdma_ucm,rdma_cm,iw_cm,ib_umad,ib_uverbs,ib_ipoib,ib_cm,ib_sa,ib_mthca,ib_mad nfs 408808 1 lockd 118720 1 nfs nfs_acl 22056 1 nfs sunrpc 302664 10 nfs,lockd,nfs_acl tg3 177980 0 mptsas 60108 0 mptscsih 52768 1 mptsas mptbase 104572 2 mptsas,mptscsih scsi_transport_sas 66552 1 mptsas scsi_mod 246448 8 ib_iser,libiscsi,scsi_transport_iscsi,sg,sd_mod,mptsas,mptscsih,scsi_transport_sas ============== 3. Start the server to run the dapltest. qs22-3:~ # dapltest -T S -d -D OpenIB-cma Server_Cmd.debug: 1 Server_Cmd.dapl_name: OpenIB-cma CMA: unable to open RDMA device Segmentation fault qs22-3:~ # ---Network Component Data--- *Additional Instructions for sijo.george@gmail.com: -Post a private note with access information to the machine that the bug is occuring on. Machine Access info: qs22-3.ltc.austin.ibm.com - root/cellrules =Comment: #1================================================= Sijo George <sijo.george@in.ibm.com> - 2008-08-28 05:06 EDT The server is running opensuse 11 =Comment: #2================================================= Anoop V. Chakkalakkal <anoop.vijayan@in.ibm.com> - 2008-08-29 09:16 EDT Core was generated by `dapltest'. Program terminated with signal 11, Segmentation fault. [New process 22486] #0 0x0ff741d4 in ibv_close_device () from /usr/lib/libibverbs.so.1 (gdb) bt #0 0x0ff741d4 in ibv_close_device () from /usr/lib/libibverbs.so.1 #1 0x0fd6aab0 in ?? () from /usr/lib/librdmacm.so.1 #2 0x0fd6ac88 in ?? () from /usr/lib/librdmacm.so.1 #3 0x0fd6d688 in rdma_create_event_channel () from /usr/lib/librdmacm.so.1 #4 0x0fda1dc4 in ?? () from /usr/lib/libdaplcma.so.1 #5 0x0fd9b37c in ?? () from /usr/lib/libdaplcma.so.1 #6 0x0ff9d71c in dat_ia_openv () from /usr/lib/libdat.so.1 #7 0x100126b8 in ?? () #8 0x10006894 in ?? () #9 0x1001b840 in ?? () #10 0x100018cc in ?? () #11 0x0fe0eb44 in ?? () from /lib/libc.so.6 #12 0x0fe0ece4 in __libc_start_main () from /lib/libc.so.6 #13 0x00000000 in ?? () =Comment: #3================================================= Anoop V. Chakkalakkal <anoop.vijayan@in.ibm.com> - 2008-08-29 09:18 EDT The source rpms for the associated packages are available @ http://download.opensuse.org/distribution/11.0/repo/src-oss/suse/src/ - Thanks Sijo =Comment: #4================================================= Anoop V. Chakkalakkal <anoop.vijayan@in.ibm.com> - 2008-08-31 05:48 EDT Segfault occuring at libibverbs-1.1.1/src/device.c, since context is NULL int __ibv_close_device(struct ibv_context *context) { int async_fd = context->async_fd; <====== int cmd_fd = context->cmd_fd; int cq_fd = -1; The check is missing at librdmacm-1.0.7/src/cma.c static void ucma_cleanup(void) { if (cma_dev_cnt) { while (cma_dev_cnt) ibv_close_device(cma_dev_array[--cma_dev_cnt].verbs); <====== =Comment: #5================================================= Anoop V. Chakkalakkal <anoop.vijayan@in.ibm.com> - 2008-08-31 05:51 EDT librdmacm-cma.patch Please rebuild librdmacm-1.0.7 with the patch and provide the results. Thanks! =Comment: #6================================================= Sijo George <sijo.george@in.ibm.com> - 2008-09-09 03:22 EDT Anoop: Currently I am verifying the patch. I will update the status once I am done. Regards: Sijo =Comment: #7================================================= Sijo George <sijo.george@in.ibm.com> - 2008-09-09 04:35 EDT Anoop: The patch doesnt seem to work. qs22-3:/usr/src/packages/RPMS/ppc64 # dapltest -T S -d -D OpenIB-cma Server_Cmd.debug: 1 Server_Cmd.dapl_name: OpenIB-cma CMA: unable to open RDMA device Segmentation fault The patched rpm can be found at /usr/src/packages/RPMS/ppc64 machine of qs22-3 Regards Sijo =Comment: #8================================================= Sijo George <sijo.george@in.ibm.com> - 2008-09-10 02:17 EDT Hi Anoop: I have verified the patch again, there was some issue with the tested package before. The patch seems to be working ============================== qs22-3:/usr/src/packages/RPMS/ppc64 # dapltest -T S -d -D OpenIB-cma Server_Cmd.debug: 1 Server_Cmd.dapl_name: OpenIB-cma CMA: unable to open RDMA device DT_cs_Server: Could not open OpenIB-cma (DAT_INTERNAL_ERROR ) DT_cs_Server: Waiting for clients to all go away... DT_cs_Server: Cleaning up ... DT_cs_Server (OpenIB-cma): Exiting. ========================== You can go ahead and close this bug now. Thanks: Sijo =Comment: #9================================================= Anoop V. Chakkalakkal <anoop.vijayan@in.ibm.com> - 2008-09-15 02:45 EDT Patch posted to community - http://lists.openfabrics.org/pipermail/general/2008-August/053930.html -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=426594 LTC BugProxy <bugproxy@us.ibm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- URL| |http:// -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=426594 User bugproxy@us.ibm.com added comment https://bugzilla.novell.com/show_bug.cgi?id=426594#c1 --- Comment #1 from LTC BugProxy <bugproxy@us.ibm.com> 2008-09-16 06:51:01 MDT --- Created an attachment (id=239821) --> (https://bugzilla.novell.com/attachment.cgi?id=239821) librdmacm-cma.patch -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=426594 Andreas Jaeger <aj@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.provo.novell.com |jjolly@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=426594 User bugproxy@us.ibm.com added comment https://bugzilla.novell.com/show_bug.cgi?id=426594#c2 --- Comment #2 from LTC BugProxy <bugproxy@us.ibm.com> 2008-10-02 09:32:43 MDT --- ------- Comment From lxie@us.ibm.com 2008-10-02 11:18 EDT------- Sijo, We can't close the bug until we verify the fix in the Distro. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=426594 John Jolly <jjolly@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P1 - Urgent -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=426594 User bugproxy@us.ibm.com added comment https://bugzilla.novell.com/show_bug.cgi?id=426594#c3 --- Comment #3 from LTC BugProxy <bugproxy@us.ibm.com> 2008-10-14 14:15:32 MDT --- ------- Comment From lxie@us.ibm.com 2008-10-02 11:18 EDT EDT------- -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=426594 User jjolly@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=426594#c4 John Jolly <jjolly@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #4 from John Jolly <jjolly@novell.com> 2008-10-15 10:09:40 MDT --- Patch added to package with 1.4rc2 sources. Submitted to head. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=426594 User bugproxy@us.ibm.com added comment https://bugzilla.novell.com/show_bug.cgi?id=426594#c5 --- Comment #5 from LTC BugProxy <bugproxy@us.ibm.com> 2008-10-30 08:02:47 MDT --- ------- Comment From lxie@us.ibm.com 2008-10-02 11:18 EDT------- -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=426594 User bugproxy@us.ibm.com added comment https://bugzilla.novell.com/show_bug.cgi?id=426594#c6 --- Comment #6 from LTC BugProxy <bugproxy@us.ibm.com> 2009-01-08 06:22:38 MST --- ------- Comment From sijo.george@in.ibm.com 2009-01-08 08:17 EDT------- Anoop: Sorry for the delay, I was on a vacation and hence the delay. I have ran the dapltest on RC1. The test executes fine without any errors. You may close this bug. Regards: Sijo -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=426594 User bugproxy@us.ibm.com added comment https://bugzilla.novell.com/show_bug.cgi?id=426594#c7 --- Comment #7 from LTC BugProxy <bugproxy@us.ibm.com> 2009-01-08 23:51:28 MST --- ------- Comment From anoop.vijayan@in.ibm.com 2009-01-09 01:48 EDT------- Sijo, Thanks for verifying this. Closing on IBM side... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com