AMDGPU.jl has made such amazing progress over the last year!

I tried AMDGPU.jl last year for my 580 Radeon GPU and it was a chore to set everything up, even for a competent sysadmin. I tried it today, and I am amazed how great it is! I have a moderately recent Linux kernel (updated Ubuntu LTS) and did not need to install anything else on my system. I simply did ] add AMDGPU. All runtime dependencies and ROCm libraries were automatically downloaded by Julia and installed in a way that does not affect the rest of my system. I did not touch sudo at all. No reading of obtuse ROCm install instructions, no adding of PPAs and clumsy apt-get commands.

@jpsamaroo , thank you and all other contributors for the amazing work you have done! If anything, the documentation should be louder about how easy it is to set up (on first read I still thought I need to do manual installs).

25 Likes

Thanks so much for the kind words! AMDGPU.jl has indeed come a long way since last year, and I expect it to pick up steam as the DoE’s Frontier supercomputer comes online this year (in fact, we’ve already validated that AMDGPU.jl works on the MI250s in the Crusher test system).

Thanks should also go to @vchuravy for his recent PRs, which include getting all the GPUArrays tests passing on master (super helpful to prevent regressions). We also have a number of other users who’ve submitted bug reports and PRs, all of which are invaluable in getting AMDGPU.jl closer to being a rock-solid GPU computing library.

Regarding a seamless usage experience: I’ve been working on getting JLLs for all the ROCm components built in Yggdrasil, and am working on tackling rocBLAS and rocFFT. Once those are working, I would like to tackle MIOpen for ML/DL support (the equivalent of CUDNN).

We’ve also gotten access to AMDGPU CI resources at MIT, which I’m still working on setting up. These have MI50 cards in them, which are in line with what modern HPC clusters have installed. We’re also chasing some leads on getting CI on MI100 and MI200 cards.

I’m looking forward to an awesome year of development! If you or anyone reading this wants to get involved, please don’t hesitate to reach out!

22 Likes

And for people that want to use the newest desktop-class Radeon GPUs, things are looking promising W5X00 and W6X00 series support in ROCm 5.X GFX 1x00 Navi1 GFX 2x00 "big" navi · Issue #1617 · RadeonOpenCompute/ROCm · GitHub (AMD has been notoriously slow to support non-datacenter devices, but I am hopeful this is changing)

2 Likes

How well does it work for numerical computation, compared to CUDA?

Theoretically, they’re equivalently powerful (maybe minus support for fancy tensor cores, because AMD doesn’t have GPUs with those). Practically, CUDA.jl is further ahead and far more mature and well-tested, but we are catching up (thanks in no small part to @maleadt himself sharing code in GPUCompiler/GPUArrays). I would say that if you already have the hardware, then try it out with your code and file issues if it doesn’t perform admirably.

4 Likes

after reading this I decided to try it with my new 6600 XT (you can tell I really want team red to win!),
and I found:

I’m not complaining, I can’t imagine but only guess the difficulty of making GPU libraries and I really appreciate all the work done by AMDGPU team. But just thought I’d leave a comment in case someone also tried it and found it not working on 1.7+.

2 Likes

I posted a link to LB Julia to 1.7, upgrade to ROCm 4.2 by jpsamaroo · Pull Request #187 · JuliaGPU/AMDGPU.jl · GitHub in that issue, which is our branch for 1.7+ support. Please give that a try!

2 Likes
Test Summary: | Pass  Total
AMDGPU        |   43     43
     Testing AMDGPU tests passed 

nice, although still:

┌ Error: Exception while generating log record in module Main at /home/akako/.julia/dev/AMDGPU/test/runtests.jl:26
│   exception =
│    UndefRefError: access to undefined reference

but I guess this is normal until fully fixed, great work thanks!

AMDGPU has way more tests than that! Make sure you did ] build AMDGPU successfully, and that AMDGPU.agents() shows at least one GPU agent. It’s possible that your GPU wasn’t detected, so only the pointer and CPU-side tests ran.

yeah this is what happened,

even though build is successful, I get

┌ Warning: ld.lld was not found, compilation functionality will be unavailable.
│ Please run Pkg.build("AMDGPU") and reload AMDGPU.
│ Reason: ld.lld executable not found
└ @ AMDGPU ~/.julia/dev/AMDGPU/src/AMDGPU.jl:207
┌ Warning: ROCm-Device-Libs were not found, device intrinsics will be unavailable.
│ Please run Pkg.build("AMDGPU") and reload AMDGPU.
│ Reason: unknown
└ @ AMDGPU ~/.julia/dev/AMDGPU/src/AMDGPU.jl:226
┌ Warning: rocRAND failed to load, RNG functionality will be unavailable.
│ Please run Pkg.build("AMDGPU") and reload AMDGPU.
│ Reason: false
└ @ AMDGPU ~/.julia/dev/AMDGPU/src/AMDGPU.jl:235

we can continue the discussion on Github maybe