[opensuse] System lockup with OpenMPI on OpenSUSE 11.2
Hi! I am running an application based on OpenMPI and GotoBLAS on OpenSUSE 11.2. Most of the time when running the application with one process per core I get the following dump and the system locks up hard: Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.049549] ------------[ cut here ]------------ Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.049613] invalid opcode: 0000 [#1] PREEMPT SMP Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.049549] ------------[ cut here ]------------ Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.049613] invalid opcode: 0000 [#1] PREEMPT SMP Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.049638] last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.049638] last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.050232] Stack: Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.050232] Stack: Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.050329] Call Trace: Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.050329] Call Trace: Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.050357] Code: 84 00 00 00 00 00 4c 89 ff e8 38 23 4b e1 41 83 84 24 58 03 00 00 01 45 01 ac 24 5c 03 00 00 4c 89 ff e8 4f 27 4b e1 31 c0 eb b0 <0f> 0b eb fe 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 Message from syslogd@gpu-dev03 at Jul 19 16:07:10 ... kernel:[ 275.050357] Code: 84 00 00 00 00 00 4c 89 ff e8 38 23 4b e1 41 83 84 24 58 03 00 00 01 45 01 ac 24 5c 03 00 00 4c 89 ff e8 4f 27 4b e1 31 c0 eb b0 <0f> 0b eb fe 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 Does anybody recognize the problem and maybe knows a workaround? I'll try to move to 11.3 tomorrow, but that will break some other third party software that should also run on the system so for me that would only be the last resort. Regards, Matthias
participants (1)
-
Matthias Bach