Issues with Reactant unable to find GPU on Colab, but CUDA.jl can

Hi guys, currently when using the Julia backend on Google Colab, I haven’t been able to get Reactant to find any Nvidia GPUs when I select a GPU runtime on Colab. E.g. when I select A100 or the L4 GPU, and then try to run:

Reactant.set_default_backend(“gpu”)

I get the following error:

No GPU client foundStacktrace: [1] error(s::String) @ Base ./error.jl:35 
[2] client(backend::String) @ Reactant.XLA ~/.julia/packages/Reactant/cbTiTU/src/xla/XLA.jl:82 
[3] set_default_backend @ ~/.julia/packages/Reactant/cTiTU/src/xla/XLA.jl:104 [inlined] 
[4] set_default_backend(backend::String) @ Reactant ~/.julia/packages/Reactant/cTiTU/src/Reactant.jl:293 
[5] top-level scope @ In[15]:1 

I’m wondering if anyone else is facing a similar issue?

As a comparison, I checked to see if at least CUDA.jl is able to find the GPU:

using Pkg
Pkg.add("CUDA")
using CUDA

println("Attempting to check CUDA functionality...")
if CUDA.functional()
    println("SUCCESS: CUDA.jl is functional and a GPU is available!")
    println("CUDA versioninfo():")
    CUDA.versioninfo()
    println("\nGPU name: ", CUDA.name(CuDevice(0)))
else
    println("FAILURE: CUDA.jl is NOT functional or no GPU is available through CUDA.jl.")
    try
        CUDA.versioninfo()
    catch e
        println("Error calling CUDA.versioninfo(): ", e)
    end
    println("Please ensure a GPU is allocated in Colab (Runtime > Change runtime type > GPU).")
end

println("\nNow, attempting to check Reactant.jl again (after the CUDA.jl check)...")
using Reactant
try
    Reactant.set_default_backend("gpu")
    println("SUCCESS: Reactant.set_default_backend(\"gpu\") did NOT error this time.")
    println("Reactant default backend: ", Reactant.get_default_backend())
catch e
    println("FAILURE: Reactant.set_default_backend(\"gpu\") still errored.")
    println("Error: ", e)
end

Here is the output of that diagnostic:

Attempting to check CUDA functionality...

SUCCESS: CUDA.jl is functional and a GPU is available!

CUDA versioninfo():

CUDA runtime 12.5, local installation

CUDA driver 12.9

NVIDIA driver 550.54.15



CUDA libraries:

- CUBLAS: 12.5.3

- CURAND: 10.3.6

- CUFFT: 11.2.3

- CUSOLVER: 11.6.3

- CUSPARSE: 12.5.1

- CUPTI: 2024.2.1 (API 23.0.0)

- NVML: 12.0.0+550.54.15



Julia packages:

- CUDA: 5.8.0

- CUDA_Driver_jll: 0.13.0+0

- CUDA_Runtime_jll: 0.17.0+0

- CUDA_Runtime_Discovery: 0.3.5



Toolchain:

- Julia: 1.10.9

- LLVM: 15.0.7



Preferences:

- CUDA_Runtime_jll.version: 12.5.1

- CUDA_Runtime_jll.local: true



1 device:

0: NVIDIA L4 (sm_89, 21.976 GiB / 22.494 GiB available)



GPU name: NVIDIA L4



Now, attempting to check Reactant.jl again (after the CUDA.jl check)...

FAILURE: Reactant.set_default_backend("gpu") still errored.

Error: ErrorException("No GPU client found")




Show thinking

Similar problem occurs when I run a different GPU e.g. an A100. I’d like to make sure this isn’t some easily-fixed issue on my end before I submit an issue on

Can you first follow Configuration | Reactant.jl and post the output here?

Sure thing. So Reactant_jll.is_available() does evaluate to true. When I run Reactant_jll.host_platform, this is what I get:

Linux x86_64 {cuda_version=none, cxxstring_abi=cxx11, gpu=none, julia_version=1.10.9, libc=glibc, libgfortran_version=5.0.0, libstdcxx_version=3.4.30, mode=opt}

When I switch to verbose output by following the configuration guide and running this afterwards:

rm(joinpath(Base.DEPOT_PATH[1], "compiled", "v$(VERSION.major).$(VERSION.minor)", "Reactant_jll"); recursive=true, force=true)
ENV["JULIA_DEBUG"] = "Reactant_jll";
Pkg.add("Reactant_jll")

I get the following output:

   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
Precompiling packages...
   2137.5 ms  ✓ Reactant_jll
  1 dependency successfully precompiled in 4 seconds. 458 already precompiled.
  1 dependency precompiled but a different version is currently loaded. Restart julia to access the new version
  1 dependency had output during precompilation:
┌ Reactant_jll
│  ┌ Debug: Detected CUDA Driver version 12.4.0
│  └ @ Reactant_jll ~/.julia/packages/Reactant_jll/ygsaO/.pkg/platform_augmentation.jl:60
│  ┌ Debug: Adding include dependency on /usr/lib64-nvidia/libcuda.so.1
│  └ @ Reactant_jll ~/.julia/packages/Reactant_jll/ygsaO/.pkg/platform_augmentation.jl:108
└  

That explains why the GPU couldn’t be found: the Reactant build you initially got was without support for CUDA.

Now I’m confused: libcuda was found correctly this time? Can you access the GPU now?

Despite that last message, Reactant.set_default_backend("gpu") still returns with:

AssertionError("Could not find registered platform with name: \"cuda\". Available platform names are: ")
No GPU client found

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] client(backend::String)
   @ Reactant.XLA ~/.julia/packages/Reactant/cTiTU/src/xla/XLA.jl:82
 [3] set_default_backend
   @ ~/.julia/packages/Reactant/cTiTU/src/xla/XLA.jl:104 [inlined]
 [4] set_default_backend(backend::String)
   @ Reactant ~/.julia/packages/Reactant/cTiTU/src/Reactant.jl:293
 [5] top-level scope
   @ In[7]:2

Since the Reactant build I get is the one without support for CUDA, perhaps I should find a way to manually update the Julia that’s installed on the colab runtime when I start it?

Ok, I just tested in Colab myself. You need to restart the session after that, and then it should work.

My understanding is that Reactant comes preinstalled in the default environment, but they built it with a non-GPU runtime (either CPU-only, or CPU+TPU), so the default precompiled pkgimage is without GPU support, you need to force Reactant_jll to be re-precompiled.

Wanted to provide an update on my end: having Reactant_jll re-precompile itself worked, and now after restarting the session, Reactant.set_default_backend("gpu") seems to be indicating that it can indeed identify my A100, and returns with:

2025-06-01 21:52:44.355014: I external/xla/xla/service/service.cc:152] XLA service 0x13ac9ac0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2025-06-01 21:52:44.355035: I external/xla/xla/service/service.cc:160]   StreamExecutor device (0): NVIDIA A100-SXM4-40GB, Compute Capability 8.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1748814764.355819    8203 se_gpu_pjrt_client.cc:1026] Using BFC allocator.
I0000 00:00:1748814764.355874    8203 gpu_helpers.cc:136] XLA backend allocating 31855853568 bytes on device 0 for BFCAllocator.
I0000 00:00:1748814764.355909    8203 gpu_helpers.cc:177] XLA backend will use up to 10618617856 bytes on device 0 for CollectiveBFCAllocator.
I0000 00:00:1748814764.361410    8203 cuda_dnn.cc:529] Loaded cuDNN version 90400

It looks like there is no need to post an issue to the Reactant.jl folks after all, and this was strictly a Colab problem :+1:

2 Likes