[opensuse-factory] openMPI mixup in Tumbleweed/Leap 15.x
Hi, I went through a few packages which have an openMPI dependency or support, and found it quite mixed up: Currently, we have openmpi(1), openmpi2 and openmpi3 in Leap and TW. While openmpi3 is currently unused, openmpi1 and openmpi2 are both used, with similar frequency: https://build.opensuse.org/package/binary/openSUSE:Factory/openmpi2:standard... standard/x86_64/openmpi2-libs-2.1.5-2.1.x86_64.rpm https://build.opensuse.org/package/binary/openSUSE:Factory/openmpi:standard/ standard/x86_64/openmpi-libs-1.10.7-21.1.x86_64.rpm Several programs will end up with implicitly linking to both versions, as libnetcdf and hdf5 use openmpi1 and boost_mpi uses openmpi2. One example is vtk. As both libraries (libmpi.so.12 and libmpi.so.20) export the same symbols for large parts, this is mayhem waiting to happen. For SLE, different MPI versions/implementations are supported using the HPC modules, but for Leap/TW, we should obviously stick with *one* single canonical version. Question now, which version to choose? Apparently, openmpi2 does not work on all architectures (PPC, PPC64BE) [1], and is not supported by some software packages [2]. Are there any drawbacks for using openmpi1 everywhere in TW/Leap 15.x? I have opened a bug report: https://bugzilla.opensuse.org/show_bug.cgi? id=1118861 Kind regards, Stefan [1] "Stay with openmpi(1) also on PPC", boost, 2018-10-01, https:// build.opensuse.org/request/show/639401 [2] "Cntk packages do not support OpenMPI 2+", https://github.com/Microsoft/ CNTK/issues/3197 -- Stefan Brüns / Bergstraße 21 / 52062 Aachen home: +49 241 53809034 mobile: +49 151 50412019
For what it's worth, I use `mpi-selector` to set a system-wide MPI implementation and then switch between them. This keeps MPI libraries from clashing when I build software from source. I'm not sure if the openSUSE packages use this method, but it works for packages I build myself really well. Are you building hdf5 and/or netcdf yourself? Cheers, Chris On Dec-08-18, Stefan Brüns wrote:
Hi,
I went through a few packages which have an openMPI dependency or support, and found it quite mixed up:
Currently, we have openmpi(1), openmpi2 and openmpi3 in Leap and TW. While openmpi3 is currently unused, openmpi1 and openmpi2 are both used, with similar frequency:
https://build.opensuse.org/package/binary/openSUSE:Factory/openmpi2:standard... standard/x86_64/openmpi2-libs-2.1.5-2.1.x86_64.rpm https://build.opensuse.org/package/binary/openSUSE:Factory/openmpi:standard/ standard/x86_64/openmpi-libs-1.10.7-21.1.x86_64.rpm
Several programs will end up with implicitly linking to both versions, as libnetcdf and hdf5 use openmpi1 and boost_mpi uses openmpi2. One example is vtk.
As both libraries (libmpi.so.12 and libmpi.so.20) export the same symbols for large parts, this is mayhem waiting to happen.
For SLE, different MPI versions/implementations are supported using the HPC modules, but for Leap/TW, we should obviously stick with *one* single canonical version.
Question now, which version to choose?
Apparently, openmpi2 does not work on all architectures (PPC, PPC64BE) [1], and is not supported by some software packages [2].
Are there any drawbacks for using openmpi1 everywhere in TW/Leap 15.x?
I have opened a bug report: https://bugzilla.opensuse.org/show_bug.cgi? id=1118861
Kind regards,
Stefan
[1] "Stay with openmpi(1) also on PPC", boost, 2018-10-01, https:// build.opensuse.org/request/show/639401 [2] "Cntk packages do not support OpenMPI 2+", https://github.com/Microsoft/ CNTK/issues/3197
-- Stefan Brüns / Bergstraße 21 / 52062 Aachen home: +49 241 53809034 mobile: +49 151 50412019
On Sonntag, 9. Dezember 2018 11:02:19 CET Chris Coutinho wrote:
For what it's worth, I use `mpi-selector` to set a system-wide MPI implementation and then switch between them. This keeps MPI libraries from clashing when I build software from source.
I'm not sure if the openSUSE packages use this method, but it works for packages I build myself really well. Are you building hdf5 and/or netcdf yourself?
Cheers, Chris
Sorry, but you have not understood the problem at hand: Currently, e.g. boost is build with openmpi2, and references the openmpi2 soname (libmpi.so.20). HDF5 is built with openmpi1, and references the openmpi1 soname(libmpi.so.12). After building, the soname are fixed in the libraries, and while you can switch e.g. between mvapich2 and openmpi1 (both use libmpi.so.12 for the soname) at runtime, you can *not* switch between openmpi1 and openmpi2. When a program uses both hdf5 and boost, it indirectly links *both* libmpi.so. 12 and libmpi.so.20. Both libraries export the same symbols, and how the dynamic linker resolves these symbols is unspecified. This is not about mpi-selector or similar mechanisms. This is about building packages which are part of the distribution. Kind regards, Stefan -- Stefan Brüns / Bergstraße 21 / 52062 Aachen home: +49 241 53809034 mobile: +49 151 50412019
On Sat, Dec 8, 2018 at 2:53 PM Stefan Brüns <stefan.bruens@rwth-aachen.de> wrote:
Hi,
I went through a few packages which have an openMPI dependency or support, and found it quite mixed up:
Currently, we have openmpi(1), openmpi2 and openmpi3 in Leap and TW. While openmpi3 is currently unused, openmpi1 and openmpi2 are both used, with similar frequency:
https://build.opensuse.org/package/binary/openSUSE:Factory/openmpi2:standard... standard/x86_64/openmpi2-libs-2.1.5-2.1.x86_64.rpm https://build.opensuse.org/package/binary/openSUSE:Factory/openmpi:standard/ standard/x86_64/openmpi-libs-1.10.7-21.1.x86_64.rpm
Several programs will end up with implicitly linking to both versions, as libnetcdf and hdf5 use openmpi1 and boost_mpi uses openmpi2. One example is vtk.
As both libraries (libmpi.so.12 and libmpi.so.20) export the same symbols for large parts, this is mayhem waiting to happen.
For SLE, different MPI versions/implementations are supported using the HPC modules, but for Leap/TW, we should obviously stick with *one* single canonical version.
Question now, which version to choose?
Apparently, openmpi2 does not work on all architectures (PPC, PPC64BE) [1], and is not supported by some software packages [2].
Are there any drawbacks for using openmpi1 everywhere in TW/Leap 15.x?
I have opened a bug report: https://bugzilla.opensuse.org/show_bug.cgi? id=1118861
Kind regards,
Stefan
[1] "Stay with openmpi(1) also on PPC", boost, 2018-10-01, https:// build.opensuse.org/request/show/639401 [2] "Cntk packages do not support OpenMPI 2+", https://github.com/Microsoft/ CNTK/issues/3197
-- Stefan Brüns / Bergstraße 21 / 52062 Aachen home: +49 241 53809034 mobile: +49 151 50412019
No matter what we pick, I think it would be a good idea to do what we do with, say, gcc and llvm/clang, where we have separate "openmpi1", "openmpi2", and "openmpi3" packages, and have the "openmpi" package refer to the default version. This would make it easy to change default versions in the future, or set default versions on a per-architecture basis. As for openmpi 1 vs openmpi 2, the problem with openmpi 1 is that it is unmaintained [1]. The current version of openmpi is actually version 4. So using it openmpi 2 as the default comes with all the problems associated with unmaintained software, especially network-oriented software. openmpi 2 also adds support for MPI 3.x features. openmpi 2 is supposed to support PPC. If it doesn't that is probably a bug that should be reported upstream. Unfortunately the linked request doesn't explain what the problem is. [1] See left sidebar here: https://www.open-mpi.org/software/ompi/v4.0/ -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Since openmpi is more or less an HPC package... why don't we apply the same strategy with modules as in SLE. that will save work duplication and give extra testing to SLE. Alin -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 12/9/18 4:18 PM, Todd Rme wrote:
Hi,
I went through a few packages which have an openMPI dependency or support, and found it quite mixed up:
Currently, we have openmpi(1), openmpi2 and openmpi3 in Leap and TW. While openmpi3 is currently unused, openmpi1 and openmpi2 are both used, with similar frequency:
https://build.opensuse.org/package/binary/openSUSE:Factory/openmpi2:standard... standard/x86_64/openmpi2-libs-2.1.5-2.1.x86_64.rpm https://build.opensuse.org/package/binary/openSUSE:Factory/openmpi:standard/ standard/x86_64/openmpi-libs-1.10.7-21.1.x86_64.rpm
Several programs will end up with implicitly linking to both versions, as libnetcdf and hdf5 use openmpi1 and boost_mpi uses openmpi2. One example is vtk.
As both libraries (libmpi.so.12 and libmpi.so.20) export the same symbols for large parts, this is mayhem waiting to happen.
For SLE, different MPI versions/implementations are supported using the HPC modules, but for Leap/TW, we should obviously stick with *one* single canonical version.
Question now, which version to choose?
Apparently, openmpi2 does not work on all architectures (PPC, PPC64BE) [1], and is not supported by some software packages [2].
Are there any drawbacks for using openmpi1 everywhere in TW/Leap 15.x?
I have opened a bug report: https://bugzilla.opensuse.org/show_bug.cgi? id=1118861
Kind regards,
Stefan
[1] "Stay with openmpi(1) also on PPC", boost, 2018-10-01, https:// build.opensuse.org/request/show/639401 [2] "Cntk packages do not support OpenMPI 2+", https://github.com/Microsoft/ CNTK/issues/3197
-- Stefan Brüns / Bergstraße 21 / 52062 Aachen home: +49 241 53809034 mobile: +49 151 50412019 No matter what we pick, I think it would be a good idea to do what we do with, say, gcc and llvm/clang, where we have separate "openmpi1", "openmpi2", and "openmpi3" packages, and have the "openmpi" package refer to the default version. This would make it easy to change default versions in the future, or set default versions on a
On Sat, Dec 8, 2018 at 2:53 PM Stefan Brüns <stefan.bruens@rwth-aachen.de> wrote: per-architecture basis.
As for openmpi 1 vs openmpi 2, the problem with openmpi 1 is that it is unmaintained [1]. The current version of openmpi is actually version 4. So using it openmpi 2 as the default comes with all the problems associated with unmaintained software, especially network-oriented software. openmpi 2 also adds support for MPI 3.x features.
openmpi 2 is supposed to support PPC. If it doesn't that is probably a bug that should be reported upstream. Unfortunately the linked request doesn't explain what the problem is.
It was disabled for ppc64be in v2.1.2 but reenabled in v2.1.4. See: https://github.com/open-mpi/ompi/issues/4349#issuecomment-374970982 My two cents on the MPI version pick: - openmpi1 has been unmaintained for over a year now. It is also deprecated in SLES/LEap15 although still available. We know there are some issue, specially reagrding the latest RDMA hardware. IMHO this should be dropped completely from Factory. - openmpi2 is the new "default" for SLES15. It seems to work well and is old enough to be stable. - openmpi3 was not picked for SLES15 as it was very recently released at the time and still pretty unstable, even running the testsuite it came with. We decided not to ship it. It might be mature enough to be a good candidate. - openmpi4 is just barely out. I haven't got around to test it yet but my best guess is that it will be similar to openmpi3 when it came out. Working but with lots of instabilty and issue (on some non x86_64 arch usually). I think it is too early to use it, although it should be packaged and available in Factory. TL;DR: I think openmpi2 and 3 are good candidates. openmpi2 has my preference because it means we can keep more in sync with SLES and Leap 15 ( which do not have openmpi3). Regarding the rest of the discussion, I've replied to the BZ#111861 Nicolas
participants (5)
-
Alin Marin Elena
-
Chris Coutinho
-
Nicolas Morey-Chaisemartin
-
Stefan Brüns
-
Todd Rme