What is the current status of GPU support in Julia?

jsjie · June 4, 2025, 1:57am

According to the official website, only NVIDIA GPUs using CUDA on Linux is offered tier 1 support. AMD GPUs using ROCm is only offered tier 3 support. However, the AMDGPU.jl repository does not have any warnings in the documentation, which indicates it should be usable? Then, what is the current status of GPU support for different GPUs in Julia?

jling · June 4, 2025, 4:36am

unclear what’s the logic here. And you should really just try to see your workload works.

I think it’s safe to say CUDA.jl has the best support still, but the difference may not matter to your specific use case already.

johnh · June 4, 2025, 9:40am

I was rather shocked to see that statement on the Julialang website.
Looking at juliagpu.org
AMDGPU.jl offers similar capabilities for AMD GPUs running on the ROCm stack. While somewhat less developed than CUDA.jl, it provides solid integration with key vendor libraries such as rocBLAS, rocFFT, rocSOLVER, and rocSPARSE. The package can be slightly more challenging to install and configure due to how ROCm is distributed, but it has been successfully used in various applications and libraries.

Maybe you can say a bit more about your application. Are you at the stage of choosing a platform?

I suppose I best decloak as a principal engineer at AMD

johnh · June 4, 2025, 10:43am

ps the link on julialang.org should be pointing to AMD ROCm ⋅ JuliaGPU

Mason · June 4, 2025, 11:50am

If you use AMDGPU.jl, I’m shocked that you’re shocked.

Tier 3 is pretty reflective of my personal experience with AMDGPU.jl. I almost always need to access my desktop machine with an Nvidia GPU when I want to test something since AMDGPU.jl is broken more often than not on recent julia releases if one is using a reasonably up-to-date kernel and other stack.

E.g. for an example of what I mean, I just tried it out and calling stuff like AMDGPU.rand(10) or AMDGPU.versioninfo() both crash my julia session with a segfault. There appears to be someone who reported this three weeks ago here and has not had any support, help, or suggestions (let alone bugfixes).

I’m sure it does work sometimes and on some configurations of machines (I’ve even succesfully gotten it working on my own machine a few times). But the description of Tier 3 is just

Tier 3: Julia may or may not build. If it does, it is unlikely to pass tests. Binaries may be available in some cases. When they are, they should be considered experimental. Ongoing support is dependent on community efforts.

So “it sometimes working for some people, but with no guarantees it’ll work for you or pass tests” is definitely an accurate description of my experience.

johnh · June 4, 2025, 12:22pm

I should have said ‘disappointed’

pxl-th · June 4, 2025, 12:23pm

AMDGPU.jl only tests against officially supported linux distributions (Ubuntu/Debian to be specific) and on those we have a very solid CI that passes all the tests for a very long time now.
That said, we only have 2 CI machines with Navi 3 GPUs, so other distributions and GPUs may vary.
Other linux distributions have custom build recipes that do not always match official ones and that may conflct with Julia and we don’t have the resources to test against all possible builds.

E.g. for an example of what I mean, I just tried it out and calling stuff like AMDGPU.rand(10) or AMDGPU.versioninfo() both crash my julia session with a segfault. There appears to be someone who reported this three weeks ago here and has not had any support, help, or suggestions (let alone bugfixes).

The original author has since resolved his issues with a message on slack saying that reinstalling libraries fixed it for him:

it is working after swtihing from rhel packages to fedora ones

StefanKarpinski · June 4, 2025, 12:48pm

These tiers are generally descriptive of the current state of affairs rather than any indication of how we’d like things to be—obviously we’d like support to be as good as possible everywhere. All that’s really needed to increase the tier of some platform is for someone to fix builds, fix tests, and help with making sure we have CI resources to keep it that way. Often this last item is the real blocker. We are, for what it’s worth, happy to have physical hardware running in a data center (or closet somewhere) for this purpose and a small amount of physical dedicated hardware goes a long way.

Mason · June 4, 2025, 12:55pm

Sure, and it’s totally fine for AMDGPU.jl to (severely) limit the linux distros it targets, and to have limited CI configurations and all that (even if I wish it wasn’t so limited).

I’m just saying that this sort of thing is indicative of why this wouldn’t be labelled as “Tier 1 support” on julia’s own website, which as far as I understand is a very high bar, and leaves a lot of ‘liability’ on the julia devs themselves.

(IMO the tier support labelling of these GPU packages is kinda a weird thing for the julia install page to do at all, but that’s a different conversation.)

jsjie · June 4, 2025, 1:03pm

I’m now preparing for a new project, which is supposed to be run on both CPUs and GPUs, and it will be a great bonus if it can support different GPUs.
My previous project, which does not need to support GPUs, is developed with Julia, and generally speaking I’m quite pleased with the develop process. Thus, I tend to continue to use Julia for my new project. I’ve also considered that even if Julia itself does not perform well on some GPUs, I can write the problematic part in C and call the library in Julia.
Thus, it is really disappointing that even Julia itself is supported so poorly in AMD GPUs, and a simple call like rand or even versioninfo will crash it. It is good news that at least Ubuntu and Debian is supported well, but the support for Centos is very important in my use case.
That being said, I’m now seriously considering to develop the new project in C or Fortran. That project is large and may take years to develop, but I’m not sure whether the support will be better before it is completed.

I’ve checked the posts again carefully, and I must say that I have limited knowledge to GPUs. It seems that AMD itself has limited support for distributions? Besides, what breaks Julia is in fact methods in AMDGPU.jl, so maybe it is still possible that I write CPU-related parts in Julia, GPU-related parts in C, and call C libraries from Julia. Will it work?

pxl-th · June 4, 2025, 1:23pm

but the support for Centos is very important in my use case

CentOS itself has limited support by AMD, so you should just test first to see if it works.
Latest ROCm to support it is 6.1 (6.4 being the latest version).

Julia itself is supported so poorly in AMD GPUs, and a simple call like rand or even versioninfo will crash it

If ROCm libraries were built incorrectly they will crash on the first API call.

I have built fairly complex Julia projects that taget different GPUs in a backend agnostic way and have great performance on both AMD & Nvidia GPUs:

Same goes for other projects that also target multiple GPUs:

pxl-th · June 4, 2025, 1:26pm

Yes

You definitely can write CPU part in Julia, even target specific GPU backends without being affected by issues from other GPU backends. And even if you do encounter issues, opening a bug report may help

Mason · June 4, 2025, 1:54pm

@jsjie do you know about KernelAbstractions.jl? It will let you just write one GPU kernel and have it work on Nvidia, Apple, AMD, or Intel GPUs depending on what GPU the user has (or run multithreaded on the CPU).

To be clear, it’s not especially poorly supported. It will for the most part either “just work” or “just fail” depending on how you system and how it was installed.

Hopefully AMD will eventually realize the value of supporting more distros, but since CentOS was completely discontinued, I wouldn’t expect that particular distro to ever get active support work. Maybe you can get it to work by making an Ubuntu docker container or chroot though. No idea.

I mean this might work, but I don’t really see the point. If AMDGPU.jl doesn’t work on your machine, the C libraries are also pretty unlikely to work on your machine. It’s really just a systemic issue with AMD GPU compute, not the specific instantiation in Julia.

tjjarvinen · June 4, 2025, 3:45pm

In Computex AMD promised support for Fedora, OpenSUSE, Ubuntu and EPEL repos.

The Fedora support is the one in Fedora repos. OpenSUSE is probably this, but it is still experimental. The Ubuntu support most likely means that in the future ROCm comes to the default Ubuntu repos.

I understood that the main intend is that you can install ROCm in the future with just sudo apt/dnf/zypper install rocm.

johnh · June 4, 2025, 4:33pm

For installing ROCm I use the amdgpu-installer script
It uses the package manager.

It also sets up /etc/alternatives - which I am no great fan of. But you dont have to use that mechanism.

johnh · June 4, 2025, 4:45pm

The Runfile installer may be useful for Julia package management
The ROCm Runfile Installer includes these features:

An optional easy-to-use user interface for configuring the installation
An optional command line interface for the installation
Offline ROCm and AMDGPU driver installation (requires the prior installation of dependencies)
Packageless ROCm and AMDGPU driver install without native package management
A single self-contained installer for all ROCm and AMDGPU driver software
Configurable installation location for the ROCm install

trixirt · June 25, 2025, 12:12pm

I maintain and up to date ROCm in several distros. There should now be a consistent ROCm experience of Fedora, EPEL (soon), OpenSUSE and SUSE. Please reach out to me if something is needed either directly or through a feature request through one of these distros bug trackers.

jpsamaroo · June 25, 2025, 2:16pm

One of the biggest issues here is the lack of debug symbols in system-provided libraries; when a segfault happens in the ROCm stack (which happens pretty frequently, and changes in unpredictable ways across versions), there is basically no way for your average user to debug the issue or help the maintainers debug the issue, because there’s no debug info available for gdb or the unwinder to work with. And no, users will not recompile ROCm in a debug build - building ROCm reliably in release mode is hard enough for experts, your average user doesn’t have anywhere near the right skill set (or patience) to figure that out.

Mason · June 25, 2025, 9:35pm

Out of curiosity, do you have any advice for getting it working on Arch / Manjaro?

trixirt · June 25, 2025, 11:32pm

ArchLinux is also up to date and i believe has the set needed and may be easier to support as they i believe carry on with the /opt/rocm path that we can not do in fedora+.

Topic		Replies	Views
AMD support is understated? GPU question	14	3457	June 4, 2020
AMDGPU.jl status GPU	11	1825	March 16, 2023
GPU computing on AMD consumer hardware? GPU	11	2537	March 8, 2022
AMDGPU.jl has made such amazing progress over the last year! GPU gpu , amdgpu	16	3392	August 18, 2022
Flux with AMD GPU(s)? Machine Learning flux , amdgpu	34	5079	February 15, 2023

What is the current status of GPU support in Julia?

Related topics