According to the official website, only NVIDIA GPUs using CUDA on Linux is offered tier 1 support. AMD GPUs using ROCm is only offered tier 3 support. However, the AMDGPU.jl repository does not have any warnings in the documentation, which indicates it should be usable? Then, what is the current status of GPU support for different GPUs in Julia?
unclear whatās the logic here. And you should really just try to see your workload works.
I think itās safe to say CUDA.jl has the best support still, but the difference may not matter to your specific use case already.
I was rather shocked to see that statement on the Julialang website.
Looking at juliagpu.org
AMDGPU.jl offers similar capabilities for AMD GPUs running on the ROCm stack. While somewhat less developed than CUDA.jl, it provides solid integration with key vendor libraries such as rocBLAS, rocFFT, rocSOLVER, and rocSPARSE. The package can be slightly more challenging to install and configure due to how ROCm is distributed, but it has been successfully used in various applications and libraries.
Maybe you can say a bit more about your application. Are you at the stage of choosing a platform?
I suppose I best decloak as a principal engineer at AMD
ps the link on julialang.org should be pointing to AMD ROCm ā JuliaGPU
If you use AMDGPU.jl, Iām shocked that youāre shocked.
Tier 3 is pretty reflective of my personal experience with AMDGPU.jl. I almost always need to access my desktop machine with an Nvidia GPU when I want to test something since AMDGPU.jl is broken more often than not on recent julia releases if one is using a reasonably up-to-date kernel and other stack.
E.g. for an example of what I mean, I just tried it out and calling stuff like AMDGPU.rand(10)
or AMDGPU.versioninfo()
both crash my julia session with a segfault. There appears to be someone who reported this three weeks ago here and has not had any support, help, or suggestions (let alone bugfixes).
Iām sure it does work sometimes and on some configurations of machines (Iāve even succesfully gotten it working on my own machine a few times). But the description of Tier 3 is just
Tier 3: Julia may or may not build. If it does, it is unlikely to pass tests. Binaries may be available in some cases. When they are, they should be considered experimental. Ongoing support is dependent on community efforts.
So āit sometimes working for some people, but with no guarantees itāll work for you or pass testsā is definitely an accurate description of my experience.
I should have said ādisappointedā
AMDGPU.jl only tests against officially supported linux distributions (Ubuntu/Debian to be specific) and on those we have a very solid CI that passes all the tests for a very long time now.
That said, we only have 2 CI machines with Navi 3 GPUs, so other distributions and GPUs may vary.
Other linux distributions have custom build recipes that do not always match official ones and that may conflct with Julia and we donāt have the resources to test against all possible builds.
E.g. for an example of what I mean, I just tried it out and calling stuff like
AMDGPU.rand(10)
orAMDGPU.versioninfo()
both crash my julia session with a segfault. There appears to be someone who reported this three weeks ago here and has not had any support, help, or suggestions (let alone bugfixes).
The original author has since resolved his issues with a message on slack saying that reinstalling libraries fixed it for him:
it is working after swtihing from rhel packages to fedora ones
These tiers are generally descriptive of the current state of affairs rather than any indication of how weād like things to beāobviously weād like support to be as good as possible everywhere. All thatās really needed to increase the tier of some platform is for someone to fix builds, fix tests, and help with making sure we have CI resources to keep it that way. Often this last item is the real blocker. We are, for what itās worth, happy to have physical hardware running in a data center (or closet somewhere) for this purpose and a small amount of physical dedicated hardware goes a long way.
Sure, and itās totally fine for AMDGPU.jl to (severely) limit the linux distros it targets, and to have limited CI configurations and all that (even if I wish it wasnāt so limited).
Iām just saying that this sort of thing is indicative of why this wouldnāt be labelled as āTier 1 supportā on juliaās own website, which as far as I understand is a very high bar, and leaves a lot of āliabilityā on the julia devs themselves.
(IMO the tier support labelling of these GPU packages is kinda a weird thing for the julia install page to do at all, but thatās a different conversation.)
Iām now preparing for a new project, which is supposed to be run on both CPUs and GPUs, and it will be a great bonus if it can support different GPUs.
My previous project, which does not need to support GPUs, is developed with Julia, and generally speaking Iām quite pleased with the develop process. Thus, I tend to continue to use Julia for my new project. Iāve also considered that even if Julia itself does not perform well on some GPUs, I can write the problematic part in C and call the library in Julia.
Thus, it is really disappointing that even Julia itself is supported so poorly in AMD GPUs, and a simple call like rand
or even versioninfo
will crash it. It is good news that at least Ubuntu and Debian is supported well, but the support for Centos is very important in my use case.
That being said, Iām now seriously considering to develop the new project in C or Fortran. That project is large and may take years to develop, but Iām not sure whether the support will be better before it is completed.
Iāve checked the posts again carefully, and I must say that I have limited knowledge to GPUs. It seems that AMD itself has limited support for distributions? Besides, what breaks Julia is in fact methods in AMDGPU.jl
, so maybe it is still possible that I write CPU-related parts in Julia, GPU-related parts in C, and call C libraries from Julia. Will it work?
but the support for Centos is very important in my use case
CentOS itself has limited support by AMD, so you should just test first to see if it works.
Latest ROCm to support it is 6.1 (6.4 being the latest version).
Julia itself is supported so poorly in AMD GPUs, and a simple call like
rand
or evenversioninfo
will crash it
If ROCm libraries were built incorrectly they will crash on the first API call.
I have built fairly complex Julia projects that taget different GPUs in a backend agnostic way and have great performance on both AMD & Nvidia GPUs:
- GitHub - JuliaNeuralGraphics/GaussianSplatting.jl: Gaussian Splatting in pure Julia
- GitHub - pxl-th/NNop.jl: Flash Attention & friends in pure Julia
- GitHub - JuliaNeuralGraphics/Nerf.jl
Same goes for other projects that also target multiple GPUs:
Yes
You definitely can write CPU part in Julia, even target specific GPU backends without being affected by issues from other GPU backends. And even if you do encounter issues, opening a bug report may help
@jsjie do you know about KernelAbstractions.jl? It will let you just write one GPU kernel and have it work on Nvidia, Apple, AMD, or Intel GPUs depending on what GPU the user has (or run multithreaded on the CPU).
To be clear, itās not especially poorly supported. It will for the most part either ājust workā or ājust failā depending on how you system and how it was installed.
Hopefully AMD will eventually realize the value of supporting more distros, but since CentOS was completely discontinued, I wouldnāt expect that particular distro to ever get active support work. Maybe you can get it to work by making an Ubuntu docker container or chroot though. No idea.
I mean this might work, but I donāt really see the point. If AMDGPU.jl doesnāt work on your machine, the C libraries are also pretty unlikely to work on your machine. Itās really just a systemic issue with AMD GPU compute, not the specific instantiation in Julia.
In Computex AMD promised support for Fedora, OpenSUSE, Ubuntu and EPEL repos.
The Fedora support is the one in Fedora repos. OpenSUSE is probably this, but it is still experimental. The Ubuntu support most likely means that in the future ROCm comes to the default Ubuntu repos.
I understood that the main intend is that you can install ROCm in the future with just sudo apt/dnf/zypper install rocm
.
For installing ROCm I use the amdgpu-installer script
It uses the package manager.
It also sets up /etc/alternatives - which I am no great fan of. But you dont have to use that mechanism.
The Runfile installer may be useful for Julia package management
The ROCm Runfile Installer includes these features:
- An optional easy-to-use user interface for configuring the installation
- An optional command line interface for the installation
- Offline ROCm and AMDGPU driver installation (requires the prior installation of dependencies)
- Packageless ROCm and AMDGPU driver install without native package management
- A single self-contained installer for all ROCm and AMDGPU driver software
- Configurable installation location for the ROCm install