CUDA.jl finds CUDA installation but cannot execute any commands

I have installed CUDA.jl . I have checked with JULIA_DEBUG=CUDA and CUDA.version() that indeed the local CUDA install is being used. But to my chagrin, not even the most simple code works:

julia> using CUDA
julia> CUDA.version()
v"11.0.0"

julia> CUDA.functional(true)
true

julia> a = CuArray{Int}(undef, 1024);
julia> b = copy(a);
julia> fill!(b, 0)
ERROR: cfunction: closures are not supported on this platform
Stacktrace:
  [1] compile_method_instance(job::GPUCompiler.CompilerJob, method_instance::Core.MethodInstance)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/eJOtJ/src/jlgen.jl:325
  [2] macro expansion
    @ ~/.julia/packages/TimerOutputs/PZq45/src/TimerOutput.jl:226 [inlined]
  [3] irgen(job::GPUCompiler.CompilerJob, method_instance::Core.MethodInstance)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/eJOtJ/src/irgen.jl:4
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/eJOtJ/src/driver.jl:142 [inlined]
  [5] macro expansion
    @ ~/.julia/packages/TimerOutputs/PZq45/src/TimerOutput.jl:226 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/GPUCompiler/eJOtJ/src/driver.jl:141 [inlined]
  [7] emit_llvm(job::GPUCompiler.CompilerJob, method_instance::Any, world::UInt64; libraries::Bool, deferred_codegen::Bool, optimize::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/eJOtJ/src/utils.jl:62
  [8] emit_llvm(job::GPUCompiler.CompilerJob, method_instance::Any, world::UInt64)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/eJOtJ/src/utils.jl:60
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/3VnCC/src/compiler/execution.jl:300
 [10] check_cache
    @ ~/.julia/packages/GPUCompiler/eJOtJ/src/cache.jl:47 [inlined]
 [11] cached_compilation
    @ ~/.julia/packages/GPUArrays/Z5nPF/src/host/construction.jl:6 [inlined]
 [12] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#4#5", Tuple{CUDA.CuKernelContext, CuDeviceVector{Int64, 1}, Int64}}}, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/eJOtJ/src/cache.jl:0
 [13] cufunction(f::GPUArrays.var"#4#5", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Int64, 1}, Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/3VnCC/src/compiler/execution.jl:289
 [14] cufunction
    @ ~/.julia/packages/CUDA/3VnCC/src/compiler/execution.jl:283 [inlined]
 [15] macro expansion
    @ ~/.julia/packages/CUDA/3VnCC/src/compiler/execution.jl:102 [inlined]
 [16] #launch_heuristic#286
    @ ~/.julia/packages/CUDA/3VnCC/src/gpuarrays.jl:17 [inlined]
 [17] launch_heuristic
    @ ~/.julia/packages/CUDA/3VnCC/src/gpuarrays.jl:17 [inlined]
 [18] gpu_call(::GPUArrays.var"#4#5", ::CuArray{Int64, 1}, ::Int64; target::CuArray{Int64, 1}, total_threads::Nothing, threads::Nothing, blocks::Nothing, name::Nothing)
    @ GPUArrays ~/.julia/packages/GPUArrays/Z5nPF/src/device/execution.jl:61
 [19] gpu_call
    @ ~/.julia/packages/GPUArrays/Z5nPF/src/device/execution.jl:46 [inlined]
 [20] fill!(A::CuArray{Int64, 1}, x::Int64)
    @ GPUArrays ~/.julia/packages/GPUArrays/Z5nPF/src/host/construction.jl:5
 [21] top-level scope
    @ REPL[14]:1
 [22] top-level scope
    @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81

Or

julia> cu(rand(5)) .* cu(rand(5))
[same error]

But,

julia> cu(rand(5,5))*cu(rand(5))
5-element CuArray{Float32, 1}:
 1.8394606
 1.5616181
 1.4858192
 1.0533601
 1.7902193

My main problem, though, is that I can’t use a Flux model with the GPU.
Any idea how I can correct this behavior?

I don’t have anything better to suggest apart from sometimes I think

]build CUDA can fix the issues.

My CUDA version is 11.3 and the code works fo me. So yeah, that might be it as well.

That command doesn’t do anything anymore.

@crinders you’re running into a known issue because of a ‘bug’ in Julia. We have a workaround in Workaround cfunction closures on non-x86 by jpsamaroo · Pull Request #191 · JuliaGPU/GPUCompiler.jl · GitHub, I’ll finish that up and get it merged. If you then want to try CUDA.jl out, use GPUCompiler.jl/GPUArrays.jl/CUDA.jl from master (or wait for a new version, probably some time next week).

good to know

Thanks! I’ll try that as soon as it gets merged.

Too bad it wasn’t something simple :-!