Issues with Reactant unable to find GPU on Colab, but CUDA.jl can

Hi guys, currently when using the Julia backend on Google Colab, I haven’t been able to get Reactant to find any Nvidia GPUs when I select a GPU runtime on Colab. E.g. when I select A100 or the L4 GPU, and then try to run:

Reactant.set_default_backend(“gpu”)

I get the following error:

No GPU client foundStacktrace: [1] error(s::String) @ Base ./error.jl:35 
[2] client(backend::String) @ Reactant.XLA ~/.julia/packages/Reactant/cbTiTU/src/xla/XLA.jl:82 
[3] set_default_backend @ ~/.julia/packages/Reactant/cTiTU/src/xla/XLA.jl:104 [inlined] 
[4] set_default_backend(backend::String) @ Reactant ~/.julia/packages/Reactant/cTiTU/src/Reactant.jl:293 
[5] top-level scope @ In[15]:1 

I’m wondering if anyone else is facing a similar issue?

As a comparison, I checked to see if at least CUDA.jl is able to find the GPU:

using Pkg
Pkg.add("CUDA")
using CUDA

println("Attempting to check CUDA functionality...")
if CUDA.functional()
    println("SUCCESS: CUDA.jl is functional and a GPU is available!")
    println("CUDA versioninfo():")
    CUDA.versioninfo()
    println("\nGPU name: ", CUDA.name(CuDevice(0)))
else
    println("FAILURE: CUDA.jl is NOT functional or no GPU is available through CUDA.jl.")
    try
        CUDA.versioninfo()
    catch e
        println("Error calling CUDA.versioninfo(): ", e)
    end
    println("Please ensure a GPU is allocated in Colab (Runtime > Change runtime type > GPU).")
end

println("\nNow, attempting to check Reactant.jl again (after the CUDA.jl check)...")
using Reactant
try
    Reactant.set_default_backend("gpu")
    println("SUCCESS: Reactant.set_default_backend(\"gpu\") did NOT error this time.")
    println("Reactant default backend: ", Reactant.get_default_backend())
catch e
    println("FAILURE: Reactant.set_default_backend(\"gpu\") still errored.")
    println("Error: ", e)
end

Here is the output of that diagnostic:

Attempting to check CUDA functionality...

SUCCESS: CUDA.jl is functional and a GPU is available!

CUDA versioninfo():

CUDA runtime 12.5, local installation

CUDA driver 12.9

NVIDIA driver 550.54.15



CUDA libraries:

- CUBLAS: 12.5.3

- CURAND: 10.3.6

- CUFFT: 11.2.3

- CUSOLVER: 11.6.3

- CUSPARSE: 12.5.1

- CUPTI: 2024.2.1 (API 23.0.0)

- NVML: 12.0.0+550.54.15



Julia packages:

- CUDA: 5.8.0

- CUDA_Driver_jll: 0.13.0+0

- CUDA_Runtime_jll: 0.17.0+0

- CUDA_Runtime_Discovery: 0.3.5



Toolchain:

- Julia: 1.10.9

- LLVM: 15.0.7



Preferences:

- CUDA_Runtime_jll.version: 12.5.1

- CUDA_Runtime_jll.local: true



1 device:

0: NVIDIA L4 (sm_89, 21.976 GiB / 22.494 GiB available)



GPU name: NVIDIA L4



Now, attempting to check Reactant.jl again (after the CUDA.jl check)...

FAILURE: Reactant.set_default_backend("gpu") still errored.

Error: ErrorException("No GPU client found")




Show thinking

Similar problem occurs when I run a different GPU e.g. an A100. I’d like to make sure this isn’t some easily-fixed issue on my end before I submit an issue on

Can you first follow Configuration | Reactant.jl and post the output here?

Sure thing. So Reactant_jll.is_available() does evaluate to true. When I run Reactant_jll.host_platform, this is what I get:

Linux x86_64 {cuda_version=none, cxxstring_abi=cxx11, gpu=none, julia_version=1.10.9, libc=glibc, libgfortran_version=5.0.0, libstdcxx_version=3.4.30, mode=opt}

When I switch to verbose output by following the configuration guide and running this afterwards:

rm(joinpath(Base.DEPOT_PATH[1], "compiled", "v$(VERSION.major).$(VERSION.minor)", "Reactant_jll"); recursive=true, force=true)
ENV["JULIA_DEBUG"] = "Reactant_jll";
Pkg.add("Reactant_jll")

I get the following output:

   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
Precompiling packages...
   2137.5 ms  ✓ Reactant_jll
  1 dependency successfully precompiled in 4 seconds. 458 already precompiled.
  1 dependency precompiled but a different version is currently loaded. Restart julia to access the new version
  1 dependency had output during precompilation:
┌ Reactant_jll
│  ┌ Debug: Detected CUDA Driver version 12.4.0
│  └ @ Reactant_jll ~/.julia/packages/Reactant_jll/ygsaO/.pkg/platform_augmentation.jl:60
│  ┌ Debug: Adding include dependency on /usr/lib64-nvidia/libcuda.so.1
│  └ @ Reactant_jll ~/.julia/packages/Reactant_jll/ygsaO/.pkg/platform_augmentation.jl:108
└  

That explains why the GPU couldn’t be found: the Reactant build you initially got was without support for CUDA.

Now I’m confused: libcuda was found correctly this time? Can you access the GPU now?

Despite that last message, Reactant.set_default_backend("gpu") still returns with:

AssertionError("Could not find registered platform with name: \"cuda\". Available platform names are: ")
No GPU client found

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] client(backend::String)
   @ Reactant.XLA ~/.julia/packages/Reactant/cTiTU/src/xla/XLA.jl:82
 [3] set_default_backend
   @ ~/.julia/packages/Reactant/cTiTU/src/xla/XLA.jl:104 [inlined]
 [4] set_default_backend(backend::String)
   @ Reactant ~/.julia/packages/Reactant/cTiTU/src/Reactant.jl:293
 [5] top-level scope
   @ In[7]:2

Since the Reactant build I get is the one without support for CUDA, perhaps I should find a way to manually update the Julia that’s installed on the colab runtime when I start it?

Ok, I just tested in Colab myself. You need to restart the session after that, and then it should work.

My understanding is that Reactant comes preinstalled in the default environment, but they built it with a non-GPU runtime (either CPU-only, or CPU+TPU), so the default precompiled pkgimage is without GPU support, you need to force Reactant_jll to be re-precompiled.