[Bug 816054] New: running MPI benchmarks over Infiniband causes SRQ_LIMIT messages to flood dmesg
https://bugzilla.novell.com/show_bug.cgi?id=816054 https://bugzilla.novell.com/show_bug.cgi?id=816054#c0 Summary: running MPI benchmarks over Infiniband causes SRQ_LIMIT messages to flood dmesg Classification: openSUSE Product: openSUSE 12.3 Version: Final Platform: x86-64 OS/Version: openSUSE 12.3 Status: NEW Severity: Minor Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: rick@microway.com QAContact: qa-bugs@suse.de Found By: --- Blocker: --- Created an attachment (id=535931) --> (http://bugzilla.novell.com/attachment.cgi?id=535931) SRQ_LIMIT messages should be debug only User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.24) Gecko/20111101 SUSE/3.6.24-0.2.1 Firefox/3.6.24 Running the NAS 3.2 benchmarks on a small cluster using ConnectX 3 cards causes messages like these: mlx4_core 0000:01:00.0: mlx4_eq_int: MLX4_EVENT_TYPE_SRQ_LIMIT to show up in dmesg ranging from a few seconds apart to about ~400 seconds apart, depending on exactly which tests I run. I have reported this bug at the openfabrics bugzilla as well already. They have already identified the issue and I created a patch to fix it. Please patch future kernel versions with this fix. http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2427 I've attached the simple patch to resolve this problem. Reproducible: Always Steps to Reproduce: 1.run mpi benchmarks from NAS on openmpi on IB 2. 3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=816054 https://bugzilla.novell.com/show_bug.cgi?id=816054#c1 --- Comment #1 from Richard Warner <rick@microway.com> 2013-05-21 15:11:53 UTC --- A new kernel update was released without this fix added. Slightly off-topic: is there somewhere else I should be posting this stuff? This bugzilla seems like the appropriate official place to post bugs and fixes, but I've had numerous bug reports, including fixes, be ignored time and time again. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=816054 https://bugzilla.novell.com/show_bug.cgi?id=816054#c Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel-maintainers@forge.pr |jjolly@suse.com |ovo.novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=816054 https://bugzilla.novell.com/show_bug.cgi?id=816054#c2 John Jolly <jjolly@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #2 from John Jolly <jjolly@suse.com> 2014-09-08 14:59:47 UTC --- Both the openSUSE kernel and the Enterprise kernel currently have the mlx4_dbg fix implemented. I am closing this bug as fixed. Please verify. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com