AMD AOMP (ROCM) Compute stack for Tumbleweed
Hi, I've been working on creating OpenSUSE packages for AMD's ROCm stack for some time. It consists of components for building and running compute workloads on AMD GPUs. One key component is a ROCm specific version of LLVM. This makes things tricky to distribute in normal fashion since we need to include an extra version of LLVM that do not collide with the rest of the system. The solution by AMD has been to put the entire stack in /opt/rocm. AMD has also released a specific build of the stack called AOMP. The focus is on OpenMP but it also includes support for OpenCL, HIP, CUDA and Fortran. This is normally installed in /usr/lib/aomp-%{version}. AOMP also includes build scripts for building the entire stack which is very helpful. What I've done is to create an AOMP package for Tumbleweed which is available at [1]. It tries to mimic what AMD has done for the package they provide for SLE15-SP1. I'm no expert on packaging so there's probably improvents to be made. Is this something that can be included in Tumbleweed? I know the optimal solution would be to have all the ROCm specific parts upstreamed and included in the official LLVM package and then provide seperate packages with normal installation paths. Unfortunately we're not there yet and this would be a stop-gap solution until then. Any comments are welcome Thanks Patrik Jakobsson [1] https://build.opensuse.org/package/show/science:GPU:ROCm/aomp
Currently, I am using the official ROCM zypper repository provided by AMD: https://repo.radeon.com/rocm/zyp/zypper/ This way is similar to how Nvidia distributes nvidia UNIX drivers on openSUSE. Is there a reason we want to replicate this package in OSS? Best, Xu -- Xu Zhao nuk.zhao@utoronto.ca On Tue, 20 Apr 2021, at 5:23 AM, Patrik Jakobsson wrote:
Hi,
I've been working on creating OpenSUSE packages for AMD's ROCm stack for some time. It consists of components for building and running compute workloads on AMD GPUs. One key component is a ROCm specific version of LLVM. This makes things tricky to distribute in normal fashion since we need to include an extra version of LLVM that do not collide with the rest of the system. The solution by AMD has been to put the entire stack in /opt/rocm.
AMD has also released a specific build of the stack called AOMP. The focus is on OpenMP but it also includes support for OpenCL, HIP, CUDA and Fortran. This is normally installed in /usr/lib/aomp-%{version}. AOMP also includes build scripts for building the entire stack which is very helpful.
What I've done is to create an AOMP package for Tumbleweed which is available at [1]. It tries to mimic what AMD has done for the package they provide for SLE15-SP1. I'm no expert on packaging so there's probably improvents to be made.
Is this something that can be included in Tumbleweed? I know the optimal solution would be to have all the ROCm specific parts upstreamed and included in the official LLVM package and then provide seperate packages with normal installation paths. Unfortunately we're not there yet and this would be a stop-gap solution until then.
Any comments are welcome
Thanks Patrik Jakobsson
[1] https://build.opensuse.org/package/show/science:GPU:ROCm/aomp
On Tue, Apr 20, 2021 at 11:10:11AM -0400, Xu Zhao wrote:
Currently, I am using the official ROCM zypper repository provided by AMD: https://repo.radeon.com/rocm/zyp/zypper/
This way is similar to how Nvidia distributes nvidia UNIX drivers on openSUSE. Is there a reason we want to replicate this package in OSS?
Sorry for late reply. I just received this email so must be some issue with delivery. The packages you're linking to are AFAIK built for SLE15-SP2. I would like to have packages built against Factory and included in Tumbleweed. Thanks Patrik
Best, Xu
-- Xu Zhao nuk.zhao@utoronto.ca
On Tue, 20 Apr 2021, at 5:23 AM, Patrik Jakobsson wrote:
Hi,
I've been working on creating OpenSUSE packages for AMD's ROCm stack for some time. It consists of components for building and running compute workloads on AMD GPUs. One key component is a ROCm specific version of LLVM. This makes things tricky to distribute in normal fashion since we need to include an extra version of LLVM that do not collide with the rest of the system. The solution by AMD has been to put the entire stack in /opt/rocm.
AMD has also released a specific build of the stack called AOMP. The focus is on OpenMP but it also includes support for OpenCL, HIP, CUDA and Fortran. This is normally installed in /usr/lib/aomp-%{version}. AOMP also includes build scripts for building the entire stack which is very helpful.
What I've done is to create an AOMP package for Tumbleweed which is available at [1]. It tries to mimic what AMD has done for the package they provide for SLE15-SP1. I'm no expert on packaging so there's probably improvents to be made.
Is this something that can be included in Tumbleweed? I know the optimal solution would be to have all the ROCm specific parts upstreamed and included in the official LLVM package and then provide seperate packages with normal installation paths. Unfortunately we're not there yet and this would be a stop-gap solution until then.
Any comments are welcome
Thanks Patrik Jakobsson
[1] https://build.opensuse.org/package/show/science:GPU:ROCm/aomp
(sending to the list as well...) Hey, thanks for trying to package ROCm. I've tried running the AMD-provided ROCm package for SLES on Tumbleweed in the past, but the experience was frustrating. As for your package, I'm not sure what the correct location of OpenCL loaders is. Mesa and pocl put theirs in "/usr/share/OpenCL/vendors", yours puts it in "/etc/OpenCL/vendors". The "clinfo" tool seems to only search in "/etc/OpenCL/vendors", so by default it doesn't even find Mesa's or pocl. clinfo also doesn't list AOMP which is in /etc/OpenCL/vendors already. I tested darktable and Blender, and they both couldn't find the AOMP OpenCL loader either... regards
On Tue, Apr 20, 2021 at 10:39:53PM +0200, Maximilian Trummer wrote:
Hey, thanks for trying to package ROCm. I've tried running the AMD-provided ROCm package for SLES on Tumbleweed in the past, but the experience was frustrating.
As for your package, I'm not sure what the correct location of OpenCL loaders is. Mesa and pocl put theirs in "/usr/share/OpenCL/vendors",
I did this. See boo#1173005
yours puts it in "/etc/OpenCL/vendors".
The "clinfo" tool seems to only search in "/etc/OpenCL/vendors", so by default it doesn't even find Mesa's or pocl.
I'm pretty sure I've tested with clinfo. And I just verified with strace on current TW that clinfo checks first for /etc/OpenCL/vendors and then finds /usr/share/OpenCL/vendors/mesa.icd. Maybe you've tried with libOpenCL of nvidia driver.
clinfo also doesn't list AOMP which is in /etc/OpenCL/vendors already. I tested darktable and Blender, and they both couldn't find the AOMP OpenCL loader either...
Thanks, Stefan Public Key available ------------------------------------------------------ Stefan Dirsch (Res. & Dev.) SUSE Software Solutions Germany GmbH Tel: 0911-740 53 0 Maxfeldstraße 5 FAX: 0911-740 53 479 D-90409 Nürnberg http://www.suse.de Germany ---------------------------------------------------------------- (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer ----------------------------------------------------------------
On Tue, Apr 20, 2021 at 10:39:53PM +0200, Maximilian Trummer wrote:
(sending to the list as well...)
Hey, thanks for trying to package ROCm. I've tried running the AMD-provided ROCm package for SLES on Tumbleweed in the past, but the experience was frustrating.
As for your package, I'm not sure what the correct location of OpenCL loaders is. Mesa and pocl put theirs in "/usr/share/OpenCL/vendors", yours puts it in "/etc/OpenCL/vendors". The "clinfo" tool seems to only search in "/etc/OpenCL/vendors", so by default it doesn't even find Mesa's or pocl. clinfo also doesn't list AOMP which is in /etc/OpenCL/vendors already. I tested darktable and Blender, and they both couldn't find the AOMP OpenCL loader either...
Hi, thanks for having a look Since the entire stack gets installed into /usr/lib/aomp_11.12-0 you need to add the library paths to /etc/ld.so.conf manually. E.g: /usr/lib/aomp_11.12-0/lib /usr/lib/aomp_11.12-0/lib64 I know this is not a great solution but it's what I have atm. It might be worth moving just the OpenCL lib to normal lib paths. That would give OpenCL support out of the box but the rest of the libs still needs to be added manually. The ICD can certainly be moved to /usr/share/OpenCL/vendors. -Patrik
regards
On Wed, Apr 21, 2021 at 10:09:51AM +0200, Patrik Jakobsson wrote:
On Tue, Apr 20, 2021 at 10:39:53PM +0200, Maximilian Trummer wrote:
(sending to the list as well...)
Hey, thanks for trying to package ROCm. I've tried running the AMD-provided ROCm package for SLES on Tumbleweed in the past, but the experience was frustrating.
As for your package, I'm not sure what the correct location of OpenCL loaders is. Mesa and pocl put theirs in "/usr/share/OpenCL/vendors", yours puts it in "/etc/OpenCL/vendors". The "clinfo" tool seems to only search in "/etc/OpenCL/vendors", so by default it doesn't even find Mesa's or pocl. clinfo also doesn't list AOMP which is in /etc/OpenCL/vendors already. I tested darktable and Blender, and they both couldn't find the AOMP OpenCL loader either...
Hi, thanks for having a look
Since the entire stack gets installed into /usr/lib/aomp_11.12-0 you need to add the library paths to /etc/ld.so.conf manually. E.g:
/usr/lib/aomp_11.12-0/lib /usr/lib/aomp_11.12-0/lib64
I know this is not a great solution but it's what I have atm. It might be worth moving just the OpenCL lib to normal lib paths. That would give OpenCL support out of the box but the rest of the libs still needs to be added manually.
The ICD can certainly be moved to /usr/share/OpenCL/vendors.
It's not such a big issue since /etc/OpenCL/vendors is checked first, then /usr/share/OpenCL/vendors. Thanks, Stefan Public Key available ------------------------------------------------------ Stefan Dirsch (Res. & Dev.) SUSE Software Solutions Germany GmbH Tel: 0911-740 53 0 Maxfeldstraße 5 FAX: 0911-740 53 479 D-90409 Nürnberg http://www.suse.de Germany ---------------------------------------------------------------- (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer ----------------------------------------------------------------
On Wed, Apr 21, 2021 at 10:32:16AM +0200, Stefan Dirsch wrote:
On Wed, Apr 21, 2021 at 10:09:51AM +0200, Patrik Jakobsson wrote:
On Tue, Apr 20, 2021 at 10:39:53PM +0200, Maximilian Trummer wrote:
(sending to the list as well...)
Hey, thanks for trying to package ROCm. I've tried running the AMD-provided ROCm package for SLES on Tumbleweed in the past, but the experience was frustrating.
As for your package, I'm not sure what the correct location of OpenCL loaders is. Mesa and pocl put theirs in "/usr/share/OpenCL/vendors", yours puts it in "/etc/OpenCL/vendors". The "clinfo" tool seems to only search in "/etc/OpenCL/vendors", so by default it doesn't even find Mesa's or pocl. clinfo also doesn't list AOMP which is in /etc/OpenCL/vendors already. I tested darktable and Blender, and they both couldn't find the AOMP OpenCL loader either...
Hi, thanks for having a look
Since the entire stack gets installed into /usr/lib/aomp_11.12-0 you need to add the library paths to /etc/ld.so.conf manually. E.g:
/usr/lib/aomp_11.12-0/lib /usr/lib/aomp_11.12-0/lib64
I know this is not a great solution but it's what I have atm. It might be worth moving just the OpenCL lib to normal lib paths. That would give OpenCL support out of the box but the rest of the libs still needs to be added manually.
The ICD can certainly be moved to /usr/share/OpenCL/vendors.
It's not such a big issue since /etc/OpenCL/vendors is checked first, then /usr/share/OpenCL/vendors.
Hmm I just tried moving it to /usr/share/OpenCL/vendors but that didn't work. Not sure why. I'll have a look at the strace output. -Patrik
Thanks, Stefan
Public Key available ------------------------------------------------------ Stefan Dirsch (Res. & Dev.) SUSE Software Solutions Germany GmbH Tel: 0911-740 53 0 Maxfeldstraße 5 FAX: 0911-740 53 479 D-90409 Nürnberg http://www.suse.de Germany ---------------------------------------------------------------- (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer ----------------------------------------------------------------
On Wed, Apr 21, 2021 at 10:38:32AM +0200, Patrik Jakobsson wrote:
The ICD can certainly be moved to /usr/share/OpenCL/vendors.
It's not such a big issue since /etc/OpenCL/vendors is checked first, then /usr/share/OpenCL/vendors.
Hmm I just tried moving it to /usr/share/OpenCL/vendors but that didn't work. Not sure why. I'll have a look at the strace output.
Sure, you've tested on TW? libOpenCL doesn't search in /usr/share/OpenCL/vendors yet on Leap. CU, Stefan Public Key available ------------------------------------------------------ Stefan Dirsch (Res. & Dev.) SUSE Software Solutions Germany GmbH Tel: 0911-740 53 0 Maxfeldstraße 5 FAX: 0911-740 53 479 D-90409 Nürnberg http://www.suse.de Germany ---------------------------------------------------------------- (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer ----------------------------------------------------------------
On Wed, Apr 21, 2021 at 10:47:43AM +0200, Stefan Dirsch wrote:
On Wed, Apr 21, 2021 at 10:38:32AM +0200, Patrik Jakobsson wrote:
The ICD can certainly be moved to /usr/share/OpenCL/vendors.
It's not such a big issue since /etc/OpenCL/vendors is checked first, then /usr/share/OpenCL/vendors.
Hmm I just tried moving it to /usr/share/OpenCL/vendors but that didn't work. Not sure why. I'll have a look at the strace output.
Sure, you've tested on TW? libOpenCL doesn't search in /usr/share/OpenCL/vendors yet on Leap.
Yes, this is on TW. I think I found the problem. The AOMP stack also compiles it's own libOpenCL which do not have your path fix. So it looks like the AOMP OpenCL support cannot live alongside other OpenCL implementations. Bummer. All of this gets solved when AMD upstreams enough of their code to LLVM. The question is what to do in the meantime. OpenCL is not the main purpose of this package so perhaps I should just remove the OpenCL support altogether? -Patrik
CU, Stefan
Public Key available ------------------------------------------------------ Stefan Dirsch (Res. & Dev.) SUSE Software Solutions Germany GmbH Tel: 0911-740 53 0 Maxfeldstraße 5 FAX: 0911-740 53 479 D-90409 Nürnberg http://www.suse.de Germany ---------------------------------------------------------------- (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer ----------------------------------------------------------------
On Wed, Apr 21, 2021 at 11:09:12AM +0200, Patrik Jakobsson wrote:
On Wed, Apr 21, 2021 at 10:47:43AM +0200, Stefan Dirsch wrote:
On Wed, Apr 21, 2021 at 10:38:32AM +0200, Patrik Jakobsson wrote:
The ICD can certainly be moved to /usr/share/OpenCL/vendors.
It's not such a big issue since /etc/OpenCL/vendors is checked first, then /usr/share/OpenCL/vendors.
Hmm I just tried moving it to /usr/share/OpenCL/vendors but that didn't work. Not sure why. I'll have a look at the strace output.
Sure, you've tested on TW? libOpenCL doesn't search in /usr/share/OpenCL/vendors yet on Leap.
Yes, this is on TW.
I think I found the problem. The AOMP stack also compiles it's own libOpenCL which do not have your path fix. So it looks like the AOMP OpenCL support cannot live alongside other OpenCL implementations. Bummer.
All of this gets solved when AMD upstreams enough of their code to LLVM. The question is what to do in the meantime. OpenCL is not the main purpose of this package so perhaps I should just remove the OpenCL support altogether?
Either this or use update-alternatives. We already support libOpenCL of ocl-icd package and the one that comes with nvidia driver that way. Search for update-alternatives in ocl-idc specfile. Which priority to use for AMD? For NVIDIA we use 100. We could use the same for AMD. Probably nobody will have both drivers installed. libOpenCL of ocl-icd uses priority = 50. Hope this helps. CU, Stefan Public Key available ------------------------------------------------------ Stefan Dirsch (Res. & Dev.) SUSE Software Solutions Germany GmbH Tel: 0911-740 53 0 Maxfeldstraße 5 FAX: 0911-740 53 479 D-90409 Nürnberg http://www.suse.de Germany ---------------------------------------------------------------- (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer ----------------------------------------------------------------
On Wed, Apr 21, 2021 at 11:41:08AM +0200, Stefan Dirsch wrote:
On Wed, Apr 21, 2021 at 11:09:12AM +0200, Patrik Jakobsson wrote:
On Wed, Apr 21, 2021 at 10:47:43AM +0200, Stefan Dirsch wrote:
On Wed, Apr 21, 2021 at 10:38:32AM +0200, Patrik Jakobsson wrote:
The ICD can certainly be moved to /usr/share/OpenCL/vendors.
It's not such a big issue since /etc/OpenCL/vendors is checked first, then /usr/share/OpenCL/vendors.
Hmm I just tried moving it to /usr/share/OpenCL/vendors but that didn't work. Not sure why. I'll have a look at the strace output.
Sure, you've tested on TW? libOpenCL doesn't search in /usr/share/OpenCL/vendors yet on Leap.
Yes, this is on TW.
I think I found the problem. The AOMP stack also compiles it's own libOpenCL which do not have your path fix. So it looks like the AOMP OpenCL support cannot live alongside other OpenCL implementations. Bummer.
All of this gets solved when AMD upstreams enough of their code to LLVM. The question is what to do in the meantime. OpenCL is not the main purpose of this package so perhaps I should just remove the OpenCL support altogether?
Either this or use update-alternatives. We already support libOpenCL of ocl-icd package and the one that comes with nvidia driver that way. Search for update-alternatives in ocl-idc specfile. Which priority to use for AMD? For NVIDIA we use 100. We could use the same for AMD. Probably nobody will have both drivers installed. libOpenCL of ocl-icd uses priority = 50.
Good idea, I'll give it a try.
Hope this helps.
CU, Stefan
Public Key available ------------------------------------------------------ Stefan Dirsch (Res. & Dev.) SUSE Software Solutions Germany GmbH Tel: 0911-740 53 0 Maxfeldstraße 5 FAX: 0911-740 53 479 D-90409 Nürnberg http://www.suse.de Germany ---------------------------------------------------------------- (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer ----------------------------------------------------------------
Am 20.04.21 um 11:23 schrieb Patrik Jakobsson:
Is this something that can be included in Tumbleweed? I know the optimal solution would be to have all the ROCm specific parts upstreamed and included in the official LLVM package and then provide seperate packages with normal installation paths. Unfortunately we're not there yet and this would be a stop-gap solution until then. It wouldn't be unprecedented: julia is another package that comes with a bundled LLVM. So I guess we can live with it.
Do you happen to know what they patched? Is it around the AMDGPU backend, or OpenMP, or the build system? It always seemed to me that AMD are pretty active upstream, but maybe they are churning out patches so fast that upstream can't keep up... Best regards, Aaron
On Fri, 23 Apr 2021, Aaron Puchert wrote:
Am 20.04.21 um 11:23 schrieb Patrik Jakobsson:
Is this something that can be included in Tumbleweed? I know the optimal solution would be to have all the ROCm specific parts upstreamed and included in the official LLVM package and then provide seperate packages with normal installation paths. Unfortunately we're not there yet and this would be a stop-gap solution until then. It wouldn't be unprecedented: julia is another package that comes with a bundled LLVM. So I guess we can live with it.
Do you happen to know what they patched? Is it around the AMDGPU backend, or OpenMP, or the build system? It always seemed to me that AMD are pretty active upstream, but maybe they are churning out patches so fast that upstream can't keep up...
I think the main issue is they're basing their "releases" upon some random LLVM revision on the development branch rather than on LLVM releases. So technically "upgrading" to the next LLVM release once it comes out would be possible but it's likely not worth the hassle. Richard.
Best regards, Aaron
-- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
On Fri, 2021-04-23 at 11:09 +0200, Richard Biener wrote:
I think the main issue is they're basing their "releases" upon some random LLVM revision on the development branch rather than on LLVM releases. So technically "upgrading" to the next LLVM release once it comes out would be possible but it's likely not worth the hassle.
We just moved Tumbleweed to LLVM 12 - that's about as new as it gets for the time being. (will be in Snapshot 0422, once it's ready to be released) Cheers, Dominique
On Fri, Apr 23, 2021 at 01:12:47AM +0200, Aaron Puchert wrote:
Am 20.04.21 um 11:23 schrieb Patrik Jakobsson:
Is this something that can be included in Tumbleweed? I know the optimal solution would be to have all the ROCm specific parts upstreamed and included in the official LLVM package and then provide seperate packages with normal installation paths. Unfortunately we're not there yet and this would be a stop-gap solution until then. It wouldn't be unprecedented: julia is another package that comes with a bundled LLVM. So I guess we can live with it.
Thanks for the pointer, I'll have a look at Julia.
Do you happen to know what they patched? Is it around the AMDGPU backend, or OpenMP, or the build system? It always seemed to me that AMD are pretty active upstream, but maybe they are churning out patches so fast that upstream can't keep up...
I think it's a bit all over the place but mostly amdgpu, openmp, hip. Although it seems development has slowed down on their github project. Perhaps they are now more active on the upstream project. I'll have to have a look. -Patrik
Best regards, Aaron
Am Freitag, 23. April 2021, 14:45:59 CEST schrieb Patrik Jakobsson:
On Fri, Apr 23, 2021 at 01:12:47AM +0200, Aaron Puchert wrote:
Am 20.04.21 um 11:23 schrieb Patrik Jakobsson:
Is this something that can be included in Tumbleweed? I know the optimal solution would be to have all the ROCm specific parts upstreamed and included in the official LLVM package and then provide seperate packages with normal installation paths. Unfortunately we're not there yet and this would be a stop-gap solution until then.
It wouldn't be unprecedented: julia is another package that comes with a bundled LLVM. So I guess we can live with it.
Thanks for the pointer, I'll have a look at Julia.
Let me know, when you have something stable in TW, I will try to make use from it in Blender.. Cheers, Pete
participants (8)
-
Aaron Puchert
-
Dominique Leuenberger / DimStar
-
Hans-Peter Jansen
-
Maximilian Trummer
-
Patrik Jakobsson
-
Richard Biener
-
Stefan Dirsch
-
Xu Zhao