Hi everyone!
I’m excited to announce GPUEnv.jl, a new utility package designed to make life easier for developers maintaining Julia packages that support multiple GPU backends.
The Problem
If your package supports CUDA.jl, AMDGPU.jl, Metal.jl, oneAPI.jl, etc., it can be a headache to manage your test/ or benchmark/ environments. Including all those backends permanently makes the parent environment incredibly slow to resolve and unnecessarily large. However, if you don’t include them, it is hard to automatically test or benchmark your code on whatever GPU hardware happens to be available on the host machine.
The Solution: GPUEnv.jl
GPUEnv.jl solves this by building a temporary (or persisted) overlay environment on top of your active project. It detects which hardware is actually available on the host machine, asks Pkg to resolve only those relevant backend packages, and leaves your parent project entirely unchanged.
Currently supported backends: JLArrays, CUDA, AMDGPU, Metal, oneAPI, and OpenCL.
Key Use Cases & Examples
1. Test Overlays
You want CPU-only CI coverage via JLArrays, but you also want to exercise real hardware (CUDA, Metal, etc.) if it’s available on the machine running the tests:
# test/runtests.jl
using GPUEnv
using Test
# Creates an overlay environment with JLArrays + any detected native GPUs
GPUEnv.activate(; include_jlarrays = true, persist = true)
for backend in gpu_backends(; include_jlarrays = true)
x = gpu_ones(backend, Float32, 64, 64)
y = gpu_ones(backend, Float32, 64, 64) .* 2
@test Array(x + y) == 3f0 .* ones(64, 64)
end
2. Benchmark Overlays
For benchmarking, you typically want to skip JLArrays since CPU mocks aren’t representative of GPU performance. GPUEnv can fetch native backends only, and you can easily skip the benchmark if no native GPU is found:
# benchmark/gpu_benchmark.jl
using GPUEnv
using BenchmarkTools
GPUEnv.activate(; include_jlarrays = false, only_first = true)
backends = gpu_backends(; include_jlarrays = false)
if isempty(backends)
println("No functional native GPU backend found; skipping benchmark run.")
else
backend = first(backends)
x = gpu_randn(backend, Float32, 1024)
@btime begin
$x .+ $y
synchronize_backend($backend)
end
end
3. Backend Prediction and Unified Allocation
Downstream code can query the installed backends and allocate arrays through a small common interface instead of branching on CUDA versus AMDGPU versus Metal everywhere:
using GPUEnv
predicted = predict_backends()
@show predicted
for backend in gpu_backends(; include_jlarrays = true)
x = gpu_zeros(backend, Float32, 64, 64)
y = gpu_ones(backend, Float32, 64, 64)
z = gpu_randn(backend, Float32, 64, 64)
@show backend.name typeof(z)
end
How it works
GPUEnv prefers direct host hints for backend prediction (Linux device nodes, Windows video controller names, macOS display info) and falls back to command-line utilities such as nvidia-smi or rocminfo if needed. Once it predicts the backends, it copies your project context, safely injects the necessary GPU packages via Pkg, and activates it. You can even pass persist = true to cache this overlay environment in a local gpu_env/ folder to save resolving time on subsequent runs!
Links & Resources
- GitHub: GitHub - hakkelt/GPUEnv.jl: Detect available GPU backends and create overlay environments with them · GitHub
- Documentation: https://hakkelt.github.io/GPUEnv.jl/
Feedback, feature requests (to a limited extent), and PRs are incredibly welcome. If you are developing a package that dispatches across multiple GPU backends, give it a try and let me know how it fits into your workflow!