Is there a recommended CUDA driver version + version of CUDA.jl I should be using with Reactant.jl? It looks like there are conflicting versions of CUDA somewhere.
I0000 00:00:1779236186.222218 3515013 service.cc:178] XLA service 0x2a8ed9d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1779236186.222254 3515013 service.cc:194] StreamExecutor [0]: NVIDIA RTX A6000, Compute Capability 8.6 (Driver: 13.2.0[580.126.9]; Runtime: 13.1.0; Toolkit: 13.1.0; DNN: 9.14.0)
I0000 00:00:1779236186.222263 3515013 service.cc:194] StreamExecutor [1]: NVIDIA RTX A6000, Compute Capability 8.6 (Driver: 13.2.0[580.126.9]; Runtime: 13.1.0; Toolkit: 13.1.0; DNN: 9.14.0)
I0000 00:00:1779236186.222270 3515013 service.cc:194] StreamExecutor [2]: NVIDIA RTX A6000, Compute Capability 8.6 (Driver: 13.2.0[580.126.9]; Runtime: 13.1.0; Toolkit: 13.1.0; DNN: 9.14.0)
I0000 00:00:1779236186.222278 3515013 service.cc:194] StreamExecutor [3]: NVIDIA RTX A6000, Compute Capability 8.6 (Driver: 13.2.0[580.126.9]; Runtime: 13.1.0; Toolkit: 13.1.0; DNN: 9.14.0)
...
I0000 00:00:1779237099.893281 3515013 subprocess_compilation.cc:500] Using nvlink for parallel linking
E0000 00:00:1779237099.904311 3515013 gpu_compiler.cc:2555] The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_enable_llvm_module_compilation_parallelism=false to bypass
it, but expect to get longer compilation time due to the lack of multi-threading. Original error: INTERNAL: nvlink exited with non-zero error code 256, output: nvlink fatal : Input file
'/tmp/tempfile-autograd.medicalmetrics.local-64e16a42e7e0aa8f-3515013-65234eb4f7249.cubin' newer than toolkit (131 vs 130)
┌ Error: Compilation failed, MLIR module written to /tmp/reactant_K6L1Zd/module_000_reactant_compute..._post_xla_compile.mlir
└ @ Reactant.MLIR.IR /cache/cvance@medicalmetrics.local/julia/packages/Reactant/WQKPd/src/mlir/IR/Pass.jl:146
Status `~/Git/campfire/examples/reactant/Project.toml`
[ac637c84] AbbreviatedStackTraces v0.3.4
[052768ef] CUDA v6.1.0
[1e4fc85f] CampfireClient v0.2.0 `../../lib/CampfireClient.jl`
[63faff8b] CampfireCommon v0.1.0 `../../lib/CampfireCommon.jl`
[682c06a0] JSON v1.6.0
[b2108857] Lux v1.31.4
[21216c6a] Preferences v1.5.2
[3c362404] Reactant v0.2.261
[295af30f] Revise v3.14.3
[a01cb732] SpiNODE v0.1.0 `../../lib/SpiNODE.jl`
[d49dbf32] WeightInitializers v1.3.4 `../../lib/Lux.jl/lib/WeightInitializers`
[9a3f8284] Random v1.11.0
I’ve also tried with the 5.X version of CUDA.jl because right now there is a bug with 6.X and WeightInitializers.jl CUDAExt, hence why I have WeightInitializers.jl deved above.
EDIT:
I didn’t realize I had CUDA 13.0 installed on the system. Maybe its using the wrong binaries. Going to try to remove it to see if that fixes the issue.