Flux with AMD GPU(s)?

johnbb · August 3, 2022, 8:12am

Has anyone used Flux with an AMD GPU? I will be involved in a project that likely will be using the LUMI supercomputer in Finland which is based on AMD.

nilshg · August 3, 2022, 9:53am

Not sure about Flux in particular, but AMD GPU support has been making good progress recently as far as I understand, see:

@jpsamaroo can probably be more specific.

jpsamaroo · August 8, 2022, 4:20pm

Hi @johnbb ! Flux should work with AMDGPU.jl, although many features (like CNNs or softmax) don’t work yet because we haven’t hooked up the necessary functions from ROCm’s MIOpen library. That should be pretty easy to wire up, though, so if you want to take this on, please let me know!

carstenbauer · August 8, 2022, 4:56pm

Pinging @luraess because he’s actively working on Julia + AMD GPUs and also is testing on LUMI if I’m not mistaken.

claforte · August 8, 2022, 9:49pm

Julian, do you maybe have a Github issue with a short list of what steps you think are required… beyond just trying each Flux test/example and seeing what’s missing/broken, then examining the CUDA.jl equivalent? Thanks!

luraess · August 8, 2022, 10:49pm

Indeed, doing some early access tests with AMDGPU.jl, MPI.jl and ImplicitGlobalGrid.jl on LUMI. The ROCm stack is functional and accessible from AMDGPU. Currently testing with Julia v1.8.0-rc3. “Classical” HPC though so nothing done with Flux (yet).

darsnack · August 9, 2022, 2:54am

In addition to AMDGPU.jl providing the necessary bindings, we’ll want to create an AMD equivalent for NNlib. Once that’s in place, Flux models should just work™. Maybe we ought to create this repo and used the NNlib interface as “the list” to track all the missing pieces top down?

claforte · August 9, 2022, 6:17pm

BTW while our 3D+ML team at AMD is using Julia and AMDGPU.jl, we’re not heavy users of Flux and NNlib yet… we write our ML kernels primarily using KernelAbstractions.jl. So while I wish we could address this Flux+AMDGPU limitation, we can’t prioritize it right now. But our team will be hiring 3 more research engineers soon. If anyone who reads this is interested in joining our team and supporting this use case, please message me.

jpsamaroo · August 9, 2022, 8:03pm

Using the NNlib interface as our list of missing features sounds good; I’m not actively focusing on Flux-based ML right now, though, so I’ll let one of you create NNlibAMDGPU.jl (or NNlibROCm.jl, etc.). Feel free to also add comments to Implement Neural Network primitives · Issue #11 · JuliaGPU/AMDGPU.jl · GitHub.

johnbb · August 18, 2022, 12:59pm

Thanks for the response everyone. I likely don’t have the skills to contribute to an AMD NNlib, unfortunately, apart from making tests and being a keen user (through Flux). As it seems now, I will not have access to LUMI/AMD GPUs before well into 2023. It would be great if we somehow could use Flux (or similar) on AMD GPUs within the next year or two.

davide445 · August 18, 2022, 4:47pm

Due current status, will make sense starting working on Flux with CUDA and next switching to AMD as soon things got more complete?
How much changes will be needed (of course limiting to ROCm supported components) porting the code from NV to AMD backend (the idea is using NV as primary and just cross test on AMD from time to time to know when is ready for a switch).

johnbb · August 19, 2022, 7:50am

Yes, in my case, I already have working models in Flux with single NVIDIA GPUs. In the aforementioned project, a (sub)project under the European Destination Earth programme, I have/had no influence on the choice of HPC/compute resources as my task is relatively a minor one. Besides, I am possibly the only one using Julia, but of course eager to demonstrate that Julia/Flux is a viable alternative to TensorFlow and PyTorch, in particular since I don’t know Python. I guess I will mostly do the development and testing on my own computer (as often is the case), but in the end I need to have code running on AMD hardware.

davide445 · August 19, 2022, 9:28am

I will do the same.
Concerned if there is any tool to check the Julia code and hilight ROCm unsupported features.
Something like HIPify.

jpsamaroo · August 21, 2022, 1:43pm

Unsupported features on the device side are usually reported as an error by GPUCompiler during compilation. There’s also some work going on to integrate JET with GPUCompiler to get a better idea of why code fails to compile, but this has some issues in upstream Julia that needs resolving first.

davide445 · December 11, 2022, 8:11pm

Returning in this topic, I have the choice to get an RX 6700 XT next week, my only goal being testing Julia ML related topics on AMD hw, before testing on Instinct hardware.
Something is changed on this side?
@claforte @luraess @pxl-th my I kindly ask if did your team made any progress regarding NNlib/Flux usage? If not, there is any plan for this?

pxl-th · December 12, 2022, 9:38am

Hi. There’s been some progress, specifically:

MIOpen was added to AMDGPU.jl: https://github.com/JuliaGPU/AMDGPU.jl/pull/320
Some initial work on NNlibROC.jl: GitHub - JuliaNeuralGraphics/NNlibROC.jl
Currently it covers convolutions and batched matrix multiplications.

So for some initial support, only thing that is left to be done is make changes to Flux.jl to be able to select what backends to use.

Other things still need work.

davide445 · December 12, 2022, 11:10am

Maybe @jpsamaroo @dhairyagandhi96 know about planning regardind this?
Was looking at https://github.com/FluxML/Flux.jl/pull/1566

RoyiAvital · December 12, 2022, 12:31pm

We all know the support for ML and GPGPU for AMD based GPU’s is much weaker than NVIDIA and CUDA.

Might it be an opportunity for Julia?
Namely, what if Julia will be the frontline of supporting non NVIDIA GPU’s for ML?
AMD and Intel GPU’s are much cheaper, more memory and with less artificial disabilities.
So promoting this niche might create an opportunity to have a hold of the market.

It requires more synched effort of development to make it work.
On Linux there is ROCm and OneAPI and for Windows one might use DirectML.

Many people will be happy to drop NVIDIA, so it might generate a momentum. It is a risk and an opportunity.

davide445 · December 12, 2022, 12:51pm

PyTorch support AMD GPU since more than one year

I think this is not the differentiation factor Julia need to work on to be honest.
Improve on this to gain broader adoption (also considering some top supercomputers does have AMD GPUs) for sure will be needed.

jpsamaroo · December 12, 2022, 6:15pm

ML support for AMDGPU.jl has greatly improved thanks to efforts by @pxl-th and his colleagues. What we need now are more people with AMD GPUs to test things out, file issues when things are broken or badly performing (and PRs if possible), and more examples of how to use AMDGPU.jl for ML. Once we have enough people helping out, then it’ll be easier to keep up with the rest of the ML ecosystem and make Julia a top-notch competitor.

Topic		Replies	Views
AMD support is understated? GPU question	14	3497	June 4, 2020
Flux ready for a beginner deep learning project? Machine Learning flux	31	8740	June 20, 2019
ArrayFire and Flux GPU question , arrayfire , flux	13	1896	July 13, 2020
GPU computing on AMD consumer hardware? GPU	11	2569	March 8, 2022
AMDGPU.jl has made such amazing progress over the last year! GPU gpu , amdgpu	16	3462	August 18, 2022

Flux with AMD GPU(s)?

Related topics