AMDGPU.jl has made such amazing progress over the last year!

I tried AMDGPU.jl last year for my 580 Radeon GPU and it was a chore to set everything up, even for a competent sysadmin. I tried it today, and I am amazed how great it is! I have a moderately recent Linux kernel (updated Ubuntu LTS) and did not need to install anything else on my system. I simply did ] add AMDGPU. All runtime dependencies and ROCm libraries were automatically downloaded by Julia and installed in a way that does not affect the rest of my system. I did not touch sudo at all. No reading of obtuse ROCm install instructions, no adding of PPAs and clumsy apt-get commands.

@jpsamaroo , thank you and all other contributors for the amazing work you have done! If anything, the documentation should be louder about how easy it is to set up (on first read I still thought I need to do manual installs).

33 Likes

Thanks so much for the kind words! AMDGPU.jl has indeed come a long way since last year, and I expect it to pick up steam as the DoE’s Frontier supercomputer comes online this year (in fact, we’ve already validated that AMDGPU.jl works on the MI250s in the Crusher test system).

Thanks should also go to @vchuravy for his recent PRs, which include getting all the GPUArrays tests passing on master (super helpful to prevent regressions). We also have a number of other users who’ve submitted bug reports and PRs, all of which are invaluable in getting AMDGPU.jl closer to being a rock-solid GPU computing library.

Regarding a seamless usage experience: I’ve been working on getting JLLs for all the ROCm components built in Yggdrasil, and am working on tackling rocBLAS and rocFFT. Once those are working, I would like to tackle MIOpen for ML/DL support (the equivalent of CUDNN).

We’ve also gotten access to AMDGPU CI resources at MIT, which I’m still working on setting up. These have MI50 cards in them, which are in line with what modern HPC clusters have installed. We’re also chasing some leads on getting CI on MI100 and MI200 cards.

I’m looking forward to an awesome year of development! If you or anyone reading this wants to get involved, please don’t hesitate to reach out!

30 Likes

And for people that want to use the newest desktop-class Radeon GPUs, things are looking promising https://github.com/RadeonOpenCompute/ROCm/issues/1617 (AMD has been notoriously slow to support non-datacenter devices, but I am hopeful this is changing)

2 Likes

How well does it work for numerical computation, compared to CUDA?

Theoretically, they’re equivalently powerful (maybe minus support for fancy tensor cores, because AMD doesn’t have GPUs with those). Practically, CUDA.jl is further ahead and far more mature and well-tested, but we are catching up (thanks in no small part to @maleadt himself sharing code in GPUCompiler/GPUArrays). I would say that if you already have the hardware, then try it out with your code and file issues if it doesn’t perform admirably.

4 Likes

after reading this I decided to try it with my new 6600 XT (you can tell I really want team red to win!),
and I found:
https://github.com/JuliaGPU/AMDGPU.jl/issues/184

I’m not complaining, I can’t imagine but only guess the difficulty of making GPU libraries and I really appreciate all the work done by AMDGPU team. But just thought I’d leave a comment in case someone also tried it and found it not working on 1.7+.

2 Likes

I posted a link to https://github.com/JuliaGPU/AMDGPU.jl/pull/187 in that issue, which is our branch for 1.7+ support. Please give that a try!

2 Likes
Test Summary: | Pass  Total
AMDGPU        |   43     43
     Testing AMDGPU tests passed 

nice, although still:

┌ Error: Exception while generating log record in module Main at /home/akako/.julia/dev/AMDGPU/test/runtests.jl:26
│   exception =
│    UndefRefError: access to undefined reference

but I guess this is normal until fully fixed, great work thanks!

AMDGPU has way more tests than that! Make sure you did ] build AMDGPU successfully, and that AMDGPU.agents() shows at least one GPU agent. It’s possible that your GPU wasn’t detected, so only the pointer and CPU-side tests ran.

yeah this is what happened,

even though build is successful, I get

┌ Warning: ld.lld was not found, compilation functionality will be unavailable.
│ Please run Pkg.build("AMDGPU") and reload AMDGPU.
│ Reason: ld.lld executable not found
└ @ AMDGPU ~/.julia/dev/AMDGPU/src/AMDGPU.jl:207
┌ Warning: ROCm-Device-Libs were not found, device intrinsics will be unavailable.
│ Please run Pkg.build("AMDGPU") and reload AMDGPU.
│ Reason: unknown
└ @ AMDGPU ~/.julia/dev/AMDGPU/src/AMDGPU.jl:226
┌ Warning: rocRAND failed to load, RNG functionality will be unavailable.
│ Please run Pkg.build("AMDGPU") and reload AMDGPU.
│ Reason: false
└ @ AMDGPU ~/.julia/dev/AMDGPU/src/AMDGPU.jl:235

we can continue the discussion on Github maybe

@jpsamaroo
Wanted to test on AMD GPU but apart purchasing one just for this purpose I’m struggling to find any cloud offering a supported GPU.
Noone is offering RX 6000 on cloud and did find only LeaderGPU cloud offering MI100, but the specific instance is never listed in the available one.
Apart AAC that I think will be reserved to big research projects, wanted to ask if you are aware about any other option.

1 Like

PS also: Vega 56 is still working on 5.2.1?
Apart cloud I will prefer also a local resource for toy testing before going for the powerful one.

If you’re interested in developing AMDGPU.jl then I might be able to provide access to my personal server, which has Vega 56 cards. If you’re just trying to give AMDGPU.jl a spin, then I’m unfortunately not aware of a good cloud resource that is known to work properly with AMDGPU.jl.

PS also: Vega 56 is still working on 5.2.1?

I would assume so, but I haven’t personally tested it.

1 Like

I have found a Vega 56 at a very reasonable price (interested to test AMDGPU.jl on a personal learning project to evaluate his usage on an upcoming commercial R&D one) but will prefer avoiding to purchase it if will not work at all on latest ROCm (no other usage for this card in our company).
Will be glad if you can find the effort running a couple of ML tests of your choice on it, to know if might work.

According to the ROCm documentation of supported hardware the following GPU are supported: AMD Documentation - Portal

Note: this is what is officially supported not what necessarily all the GPUs that work with ROCm. For example, the Arch Linux ROCm packages also build for the Polaris targets (gfx8**).

Regarding the Vega 56 personally, I will say that I know people who have used tensorflow with the rocm backend for it. See the following discussion:

You can maybe find more uses of people using the Vega 56 on ROCm if you search that repo or if you search some of the AMD ROCm repos.

1 Like

My goal will be using Flux, searching in this forum didn’t find anyone using it with Vega 56 and 5.x ROCm, thus my posting here.
Will try looking also in ROCm repo.

As far as I am aware of Flux does not care about what GPU you use, but rather what ROCm version your GPU can use. Considering that Vega 56 seems to support ROCm 5.x, if Flux supports ROCm 5.x, then Flux will work on Vega 56.

The question of GPU <=> ROCm version is a AMD question and the question of ROCm <=> Flux is a Julia AMDGPU.jl/Flux.jl question.

To find more about if Flux works with ROCm 5.x see the following thread: Flux with AMD GPU(s)?

2 Likes