I can order that 6700 even if I doubt will be able to help so much, except if the community is so thin will make some difference (happy to).
My doubt is if Flux so far is not able to select AMDGPU at all will be with no use.
Flux support is basically there, we (probably) just need to wire up NNlib with NNlibROC.jl as a weak dependency (new in Julia 1.9, should be present in RC2) so that loading Flux and AMDGPU causes the right support code to be loaded (@ToucheSir is working on Re-integrate NNlibCUDA as a package extension by ToucheSir ¡ Pull Request #445 ¡ FluxML/NNlib.jl ¡ GitHub which is the equivalent work for NNlibCUDA.jl).
FYI, @pxl-th and I test on RX 6700s, so we can reproduce any bugs or performance issues you might find (and weâve already squashed quite a few Navi2-related bugs in just the past 2 months). So far, we havenât found any show-stopping bugs specific to Navi2, so itâs probably a good purchase if youâre looking for a non-Instinct AMD card.
Ok I count to have it next week and letâs see how I can help.
To add onto what Julian said, the biggest missing piece for running common DL models on AMDGPU is wrapping MIOpen functions for pooling. Once thatâs done, my hope is weâll have the NNlib CUDA extension working and integrating NNlibROC should be trivial. No guarantee anything will be bug-free, but given I also have a 6700 XT there is plenty of desire to get this tested and working soon.
It doesnât support conda
and limited to Linux (As it is based on ROCm
).
The idea was making giving something which is not achievable in other frameworks / languages.
It might be an herculean effort, but it also should bring the eco system to the front, as a leader and not just chasing others.
It doesnât support
conda
With Julia there are now artifacts for most of the ROCm stuff, including MIOpen, so you can just ]add MIOpen_jll
and start using it.
I never manually install any of the ROCm packages and rely only artifacts (same for CUDA).
Just to tell itâs all shifted to beginning next year. In the meantime will learn on Flux so will be able to contribute a bit more.
Hello everyone,
Apologies for the (necro?)bump, but it seems that this is the most recent discussion there has been on Flux for AMD GPUs via ROCm. I own an RX 6600 and Iâve been following AMDGPU.jl for a while now (thanks to this thread; I currently moved to Julia 1.9 to further give it some exercise), so if I may politely ask: how might I go about setting up a stack to test ROCm with?
I have a couple of projects I would like to try working through purely in Julia, but the only question mark is whether I can use Flux with AMDGPU.jl or not. One of the specific questions is would Flux automatically understand that Iâm working with AMD silicon, such that all I need use is something like @cuda
(similar to PyTorch)?
I would love to try to test this as much as possible and I read that @jpsamaroo has been testing Flux on RX 6700s, but Iâm unsure about how to set up a usage pipeline for myself. Thank you in advance for your time!
Edit: I have Julia 1.9, AMDGPU.jl, Flux.jl and a ROCm stack installed and working on Arch Linux 6.1.9. Should I also perhaps ask the same question as an issue over at Fluxâs GitHub repository?
Weâre currently at the âintegrating NNlibROCâ step mentioned above. If you want to follow this progress, check out Add AMDGPU extension by pxl-th ¡ Pull Request #470 ¡ FluxML/NNlib.jl ¡ GitHub and any future linked issues/PRs there.
By default AMDGPU.jl will install and use its own ROCm 5.3 stack (which is compatible with Julia 1.9), so you donât even need to install it manually. You can verify it with:
AMDGPU.versioninfo()
And I suspect that you need to launch Julia with HSA_OVERRIDE_GFX_VERSION=10.3.0
env variable, like so:
HSA_OVERRIDE_GFX_VERSION=10.3.0 julia --threads=auto --project=.'
Because ROCm for Navi 2 only supports Navi 21 (gfx1030), but yours is Navi 23 (gfx1032).
That said all Navi 2 have identical ISA, so you shouldnât notice a difference (I guess).
And as @ToucheSir said, AMDGPU support for Flux is being actively developed and I hope it wonât take long.
Indeed you are correct:
-
using AMDGPU; AMDGPU.versioninfo()
does indeed detect my GPU and CPU correctly, though I canât know for sure whether itâs using its own ROCm stack or the one I installed (5.4). - Using the HSA override
HSA_OVERRIDE_GFX_VERSION=10.3.0
is necessary and works perfectly fine even though my GPU is gfx1032. FWIW, I also use this for PyTorch.
I have yet to try training a network on my card (and if I understand this correctly it shouldnât be too dissimilar to PyTorchâs approach for ROCm), but I have to say I am quite amazed at how quickly not only AMDGPU.jl progressed, but also its integration with Flux thanks to the hard work and hours put in by you kind people. I donât think itâs that far off either; I look forward to testing a lot of the upcoming releases and helping out where I can.
AMDGPU.versioninfo
probably could stand to show if artifacts (JLLs) are being used or the system; hence versioninfo: Indicate if using JLLs or System by jpsamaroo ¡ Pull Request #381 ¡ JuliaGPU/AMDGPU.jl ¡ GitHub. If you see paths that point to /home/myusername/.julia/artifacts/...
, youâre using JLL-provided ROCm artifacts; otherwise, youâre using a system install of ROCm.
Once AMDGPU support is merged into NNlib, weâll need to also look at some way of moving Fluxâs models to the GPU (which is trivial to implement, we just need a function to do it).
Apologies for the very late reply; yes, it seems that Iâm using JLL provided artefacts for AMDGPU since all of the paths for (most) ROCm requirements point to ~/.julia/artifacts/...
.
I say some because some of them also report as missing (rocSOLVER
, rocALUTION
, rocFFT
and MIOpen
), and from what Iâve seen across the AMDGPU
and NNLib
repos, I understand that these are the ones that are currently being worked on, so in short - everything is working as expected!
MIOpen is available as an artifact, but is not installed by default.
You can ]add MIOpen_jll
to install it, but you probably need to ]dev AMDGPU
and execute that command from .julia/dev/AMDGPU
directory to trigger the detection (for now).
You can track this PR which will add initial support for AMDGPU in Flux.jl.
Oops, I seem to have missed this earlier comment of yours explaining the same thing. As an aside, itâs awesome to see the recent PR under Fluxâs repo directly, and also to see that the changes in pull #470 have been merged! Just fantastic!