Flux with AMD GPU(s)?

Has anyone used Flux with an AMD GPU? I will be involved in a project that likely will be using the LUMI supercomputer in Finland which is based on AMD.

3 Likes

Not sure about Flux in particular, but AMD GPU support has been making good progress recently as far as I understand, see:

@jpsamaroo can probably be more specific.

5 Likes

Hi @johnbb ! Flux should work with AMDGPU.jl, although many features (like CNNs or softmax) don’t work yet because we haven’t hooked up the necessary functions from ROCm’s MIOpen library. That should be pretty easy to wire up, though, so if you want to take this on, please let me know!

6 Likes

Pinging @luraess because he’s actively working on Julia + AMD GPUs and also is testing on LUMI if I’m not mistaken.

1 Like

Julian, do you maybe have a Github issue with a short list of what steps you think are required… beyond just trying each Flux test/example and seeing what’s missing/broken, then examining the CUDA.jl equivalent? Thanks!

2 Likes

Indeed, doing some early access tests with AMDGPU.jl, MPI.jl and ImplicitGlobalGrid.jl on LUMI. The ROCm stack is functional and accessible from AMDGPU. Currently testing with Julia v1.8.0-rc3. “Classical” HPC though so nothing done with Flux (yet).

2 Likes

In addition to AMDGPU.jl providing the necessary bindings, we’ll want to create an AMD equivalent for NNlib. Once that’s in place, Flux models should just work™. Maybe we ought to create this repo and used the NNlib interface as “the list” to track all the missing pieces top down?

7 Likes

BTW while our 3D+ML team at AMD is using Julia and AMDGPU.jl, we’re not heavy users of Flux and NNlib yet… we write our ML kernels primarily using KernelAbstractions.jl. So while I wish we could address this Flux+AMDGPU limitation, we can’t prioritize it right now. But our team will be hiring 3 more research engineers soon. If anyone who reads this is interested in joining our team and supporting this use case, please message me. :slight_smile:

12 Likes

Using the NNlib interface as our list of missing features sounds good; I’m not actively focusing on Flux-based ML right now, though, so I’ll let one of you create NNlibAMDGPU.jl (or NNlibROCm.jl, etc.). Feel free to also add comments to Implement Neural Network primitives · Issue #11 · JuliaGPU/AMDGPU.jl · GitHub.

1 Like

Thanks for the response everyone. I likely don’t have the skills to contribute to an AMD NNlib, unfortunately, apart from making tests and being a keen user (through Flux). As it seems now, I will not have access to LUMI/AMD GPUs before well into 2023. It would be great if we somehow could use Flux (or similar) on AMD GPUs within the next year or two.

Due current status, will make sense starting working on Flux with CUDA and next switching to AMD as soon things got more complete?
How much changes will be needed (of course limiting to ROCm supported components) porting the code from NV to AMD backend (the idea is using NV as primary and just cross test on AMD from time to time to know when is ready for a switch).

Yes, in my case, I already have working models in Flux with single NVIDIA GPUs. In the aforementioned project, a (sub)project under the European Destination Earth programme, I have/had no influence on the choice of HPC/compute resources as my task is relatively a minor one. Besides, I am possibly the only one using Julia, but of course eager to demonstrate that Julia/Flux is a viable alternative to TensorFlow and PyTorch, in particular since I don’t know Python. I guess I will mostly do the development and testing on my own computer (as often is the case), but in the end I need to have code running on AMD hardware.

I will do the same.
Concerned if there is any tool to check the Julia code and hilight ROCm unsupported features.
Something like HIPify.

Unsupported features on the device side are usually reported as an error by GPUCompiler during compilation. There’s also some work going on to integrate JET with GPUCompiler to get a better idea of why code fails to compile, but this has some issues in upstream Julia that needs resolving first.

2 Likes

Returning in this topic, I have the choice to get an RX 6700 XT next week, my only goal being testing Julia ML related topics on AMD hw, before testing on Instinct hardware.
Something is changed on this side?
@claforte @luraess @pxl-th my I kindly ask if did your team made any progress regarding NNlib/Flux usage? If not, there is any plan for this?

1 Like

Hi. There’s been some progress, specifically:

So for some initial support, only thing that is left to be done is make changes to Flux.jl to be able to select what backends to use.

Other things still need work.

6 Likes

Maybe @jpsamaroo @dhairyagandhi96 know about planning regardind this?
Was looking at https://github.com/FluxML/Flux.jl/pull/1566

We all know the support for ML and GPGPU for AMD based GPU’s is much weaker than NVIDIA and CUDA.

Might it be an opportunity for Julia?
Namely, what if Julia will be the frontline of supporting non NVIDIA GPU’s for ML?
AMD and Intel GPU’s are much cheaper, more memory and with less artificial disabilities.
So promoting this niche might create an opportunity to have a hold of the market.

It requires more synched effort of development to make it work.
On Linux there is ROCm and OneAPI and for Windows one might use DirectML.

Many people will be happy to drop NVIDIA, so it might generate a momentum. It is a risk and an opportunity.

1 Like

PyTorch support AMD GPU since more than one year

I think this is not the differentiation factor Julia need to work on to be honest.
Improve on this to gain broader adoption (also considering some top supercomputers does have AMD GPUs) for sure will be needed.

ML support for AMDGPU.jl has greatly improved thanks to efforts by @pxl-th and his colleagues. What we need now are more people with AMD GPUs to test things out, file issues when things are broken or badly performing (and PRs if possible), and more examples of how to use AMDGPU.jl for ML. Once we have enough people helping out, then it’ll be easier to keep up with the rest of the ML ecosystem and make Julia a top-notch competitor.

3 Likes