So far, during my quest to get deep learning working in Julia, I found FastAI to have multiple dependency issues (see my previous posts). And Flux itself currently requires a downgrade of AMDGPU to v0.6.x, so I also had issues there.
But progress is being made!
In the case of Lux, I only needed to downgrade from v0.8 to v0.7.6 and was able to use the gpu to train a model.* However, it is much slower than on CPU. In the tutorial (presumably using nVidia/CUDA), they report ~6 sec for first epoch then ~0.4 s for the rest:
I ran that code in two fresh REPL sessions, only commenting out using LuxAMDGPU
for the CPU trial. In case someone wants to try it, here is the actual script used: using Zygote, ComponentArrays, Lux, SciMLSensitivity, Optimisers, Ordinar - Pastebin.com
CPU:
julia> train(NeuralODE)
β Warning: No functional GPU backend found! Defaulting to CPU.
β
β 1. If no GPU is available, nothing needs to be done.
β 2. If GPU is available, load the corresponding trigger package.
β a. LuxCUDA.jl for NVIDIA CUDA Support!
β b. LuxAMDGPU.jl for AMD GPU ROCM Support!
β c. Metal.jl for Apple Metal GPU Support!
β @ LuxDeviceUtils ~/.julia/packages/LuxDeviceUtils/Dee3d/src/LuxDeviceUtils.jl:158
[1/9] Time 1.33s Training Accuracy: 50.96% Test Accuracy: 43.33%
[2/9] Time 0.1s Training Accuracy: 69.63% Test Accuracy: 66.0%
[3/9] Time 0.08s Training Accuracy: 77.93% Test Accuracy: 71.33%
[4/9] Time 0.08s Training Accuracy: 80.74% Test Accuracy: 76.67%
[5/9] Time 0.08s Training Accuracy: 82.52% Test Accuracy: 78.0%
[6/9] Time 0.09s Training Accuracy: 84.07% Test Accuracy: 78.67%
[7/9] Time 0.08s Training Accuracy: 85.33% Test Accuracy: 80.67%
[8/9] Time 0.08s Training Accuracy: 86.59% Test Accuracy: 81.33%
[9/9] Time 0.09s Training Accuracy: 87.7% Test Accuracy: 82.0%
GPU:
julia> train(NeuralODE)
[1/9] Time 2.88s Training Accuracy: 50.96% Test Accuracy: 43.33%
[2/9] Time 1.38s Training Accuracy: 69.63% Test Accuracy: 66.0%
[3/9] Time 2.12s Training Accuracy: 77.93% Test Accuracy: 71.33%
[4/9] Time 1.87s Training Accuracy: 80.74% Test Accuracy: 76.67%
[5/9] Time 2.14s Training Accuracy: 82.52% Test Accuracy: 78.0%
[6/9] Time 2.21s Training Accuracy: 84.07% Test Accuracy: 78.67%
[7/9] Time 4.31s Training Accuracy: 85.33% Test Accuracy: 80.67%
[8/9] Time 2.83s Training Accuracy: 86.59% Test Accuracy: 81.33%
[9/9] Time 2.78s Training Accuracy: 87.7% Test Accuracy: 82.0%
Can anyone verify whether that is a bad model to benchmark GPU vs CPU?
When monitoring GPU usage, I did see that very little time was spent actually doing computations (it was very βspikyβ). Also, when I reran the GPU model in the same julia session, VRAM usage kept growing and it got slower and slower.
From looking at the repos, Iβm starting to suspect I decided to try Julia just a few months before the ML ecosystem worked out the AMDGPU kinks.
*Also I decided to try the 1.10 release candidate since it had a newer version of LLVM than the 1.9.4 I got from juliaup for some reason. Here is the current versioninfo:
julia> versioninfo()
Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 64 Γ AMD Ryzen Threadripper 2990WX 32-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver1)
Threads: 11 on 64 virtual cores
julia> AMDGPU.versioninfo()
ROCm provided by: system
[+] HSA Runtime v1.1.0
@ /opt/rocm-5.7.1/lib/libhsa-runtime64.so
[+] ld.lld
@ /opt/rocm/llvm/bin/ld.lld
[+] ROCm-Device-Libs
@ /home/user1/.julia/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a45/amdgcn/bitcode
[+] HIP Runtime v5.7.31921
@ /opt/rocm-5.7.1/lib/libamdhip64.so
[+] rocBLAS v3.1.0
@ /opt/rocm-5.7.1/lib/librocblas.so
[+] rocSOLVER v3.23.0
@ /opt/rocm-5.7.1/lib/librocsolver.so
[+] rocALUTION
@ /opt/rocm-5.7.1/lib/librocalution.so
[+] rocSPARSE
@ /opt/rocm-5.7.1/lib/librocsparse.so.0
[+] rocRAND v2.10.5
@ /opt/rocm-5.7.1/lib/librocrand.so
[+] rocFFT v1.0.21
@ /opt/rocm-5.7.1/lib/librocfft.so
[+] MIOpen v2.20.0
@ /opt/rocm-5.7.1/lib/libMIOpen.so
HIP Devices [2]
1. HIPDevice(name="AMD Radeon VII", id=1, gcn_arch=gfx906:sramecc+:xnack-)
2. HIPDevice(name="Radeon RX 580 Series", id=2, gcn_arch=gfx803)