Mailinglist Archive: opensuse-factory (188 mails)

< Previous Next >
[opensuse-factory] Re: [opensuse-science] openMPI mixup in Tumbleweed/Leap 15.x


On 12/9/18 4:18 PM, Todd Rme wrote:
On Sat, Dec 8, 2018 at 2:53 PM Stefan Brüns
<stefan.bruens@xxxxxxxxxxxxxx> wrote:
Hi,

I went through a few packages which have an openMPI dependency or support,
and
found it quite mixed up:

Currently, we have openmpi(1), openmpi2 and openmpi3 in Leap and TW. While
openmpi3 is currently unused, openmpi1 and openmpi2 are both used, with
similar frequency:

https://build.opensuse.org/package/binary/openSUSE:Factory/openmpi2:standard/
standard/x86_64/openmpi2-libs-2.1.5-2.1.x86_64.rpm
https://build.opensuse.org/package/binary/openSUSE:Factory/openmpi:standard/
standard/x86_64/openmpi-libs-1.10.7-21.1.x86_64.rpm

Several programs will end up with implicitly linking to both versions, as
libnetcdf and hdf5 use openmpi1 and boost_mpi uses openmpi2. One example is
vtk.

As both libraries (libmpi.so.12 and libmpi.so.20) export the same symbols for
large parts, this is mayhem waiting to happen.

For SLE, different MPI versions/implementations are supported using the HPC
modules, but for Leap/TW, we should obviously stick with *one* single
canonical version.

Question now, which version to choose?

Apparently, openmpi2 does not work on all architectures (PPC, PPC64BE) [1],
and is not supported by some software packages [2].

Are there any drawbacks for using openmpi1 everywhere in TW/Leap 15.x?

I have opened a bug report: https://bugzilla.opensuse.org/show_bug.cgi?
id=1118861

Kind regards,

Stefan


[1] "Stay with openmpi(1) also on PPC", boost, 2018-10-01, https://
build.opensuse.org/request/show/639401
[2] "Cntk packages do not support OpenMPI 2+", https://github.com/Microsoft/
CNTK/issues/3197

--
Stefan Brüns / Bergstraße 21 / 52062 Aachen
home: +49 241 53809034 mobile: +49 151 50412019
No matter what we pick, I think it would be a good idea to do what we
do with, say, gcc and llvm/clang, where we have separate "openmpi1",
"openmpi2", and "openmpi3" packages, and have the "openmpi" package
refer to the default version. This would make it easy to change
default versions in the future, or set default versions on a
per-architecture basis.

As for openmpi 1 vs openmpi 2, the problem with openmpi 1 is that it
is unmaintained [1]. The current version of openmpi is actually
version 4. So using it openmpi 2 as the default comes with all the
problems associated with unmaintained software, especially
network-oriented software. openmpi 2 also adds support for MPI 3.x
features.

openmpi 2 is supposed to support PPC. If it doesn't that is probably
a bug that should be reported upstream. Unfortunately the linked
request doesn't explain what the problem is.

It was disabled for ppc64be in v2.1.2 but reenabled in v2.1.4.
See: https://github.com/open-mpi/ompi/issues/4349#issuecomment-374970982

My two cents on the MPI version pick:
- openmpi1 has been unmaintained for over a year now. It is also deprecated in
SLES/LEap15 although still available.
We know there are some issue, specially reagrding the latest RDMA hardware.
IMHO this should be dropped completely from Factory.
- openmpi2 is the new "default" for SLES15. It seems to work well and is old
enough to be stable.
- openmpi3 was not picked for SLES15 as it was very recently released at the
time and still pretty unstable, even running the testsuite it came with. We
decided not to ship it.
It might be mature enough to be a good candidate.
- openmpi4 is just barely out. I haven't got around to test it yet but my best
guess is that it will be similar to openmpi3 when it came out. Working but with
lots of instabilty and issue (on some non x86_64 arch usually).
I think it is too early to use it, although it should be packaged and available
in Factory.

TL;DR: I think openmpi2 and 3 are good candidates. openmpi2 has my preference
because it means we can keep more in sync with SLES and Leap 15 ( which do not
have openmpi3).

Regarding the rest of the discussion, I've replied to the BZ#111861

Nicolas



< Previous Next >
References