I am developing a library and using CUDA. Because of the new capabilities of CUDA 3.3, I decided to upgrate to Julia 1.6 and now all the test I run fail in kernels depending on atomic operations.
I tried to make the most basic example and it seems to be a problem on the pointer invocation.
Example:
using CUDA
function kernel(x)
for i in 1:length(x)
CUDA.atomic_add!(pointer(x,1),1)
end
return
end
x = CUDA.zeros(4)
@cuda kernel(x)
and I obtain an answer like:
julia> @cuda kernel(x)
ERROR: InvalidIRError: compiling kernel kernel(CuDeviceVector{Float32, 1}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to atomic_add!)
Stacktrace:
[1] atomic_arrayset
@ ~/.julia/packages/CUDA/mVgLI/src/device/intrinsics/atomics.jl:498
[2] atomic_arrayset
@ ~/.julia/packages/CUDA/mVgLI/src/device/intrinsics/atomics.jl:480
[3] macro expansion
@ ~/.julia/packages/CUDA/mVgLI/src/device/intrinsics/atomics.jl:475
[4] kernel
@ REPL[6]:3
Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(kernel), Tuple{CuDeviceVector{Float32, 1}}}}, args::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/2WWTr/src/validation.jl:111
[2] macro expansion
@ ~/.julia/packages/GPUCompiler/2WWTr/src/driver.jl:319 [inlined]
[3] macro expansion
@ ~/.julia/packages/TimerOutputs/PZq45/src/TimerOutput.jl:226 [inlined]
[4] macro expansion
@ ~/.julia/packages/GPUCompiler/2WWTr/src/driver.jl:317 [inlined]
[5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/2WWTr/src/utils.jl:62
[6] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/mVgLI/src/compiler/execution.jl:313
[7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/2WWTr/src/cache.jl:89
[8] cufunction(f::typeof(kernel), tt::Type{Tuple{CuDeviceVector{Float32, 1}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/packages/CUDA/mVgLI/src/compiler/execution.jl:288
[9] cufunction(f::typeof(kernel), tt::Type{Tuple{CuDeviceVector{Float32, 1}}})
@ CUDA ~/.julia/packages/CUDA/mVgLI/src/compiler/execution.jl:282
[10] top-level scope
@ ~/.julia/packages/CUDA/mVgLI/src/compiler/execution.jl:102
[11] top-level scope
@ ~/.julia/packages/CUDA/mVgLI/src/initialization.jl:52
My device and packages are the following:
┌ Info: System information:
│ CUDA toolkit 11.3.1, artifact installation
│ CUDA driver 11.3.0
│ NVIDIA driver 465.31.0
│
│ Libraries:
│ - CUBLAS: 11.5.1
│ - CURAND: 10.2.4
│ - CUFFT: 10.4.2
│ - CUSOLVER: 11.1.2
│ - CUSPARSE: 11.6.0
│ - CUPTI: 14.0.0
│ - NVML: 11.0.0+465.31
│ - CUDNN: 8.20.0 (for CUDA 11.3.0)
│ - CUTENSOR: 1.3.0 (for CUDA 11.2.0)
│
│ Toolchain:
│ - Julia: 1.6.1
│ - LLVM: 12.0.0
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
│ - Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
│
│ 1 device:
└ 0: NVIDIA GeForce RTX 2070 SUPER (sm_75, 7.176 GiB / 7.792 GiB available)
Could anyone give me a clue?
Thanks,
Gabriel