Atomic operations stop working when upgrading to CUDA.jl version 3.3

gatocor · June 16, 2021, 6:14pm

I am developing a library and using CUDA. Because of the new capabilities of CUDA 3.3, I decided to upgrate to Julia 1.6 and now all the test I run fail in kernels depending on atomic operations.

I tried to make the most basic example and it seems to be a problem on the pointer invocation.

Example:

using CUDA

function kernel(x)
      for i in 1:length(x)
           CUDA.atomic_add!(pointer(x,1),1)
      end
      return
end

x = CUDA.zeros(4)
@cuda kernel(x)

and I obtain an answer like:

julia> @cuda kernel(x)
ERROR: InvalidIRError: compiling kernel kernel(CuDeviceVector{Float32, 1}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to atomic_add!)
Stacktrace:
 [1] atomic_arrayset
   @ ~/.julia/packages/CUDA/mVgLI/src/device/intrinsics/atomics.jl:498
 [2] atomic_arrayset
   @ ~/.julia/packages/CUDA/mVgLI/src/device/intrinsics/atomics.jl:480
 [3] macro expansion
   @ ~/.julia/packages/CUDA/mVgLI/src/device/intrinsics/atomics.jl:475
 [4] kernel
   @ REPL[6]:3
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(kernel), Tuple{CuDeviceVector{Float32, 1}}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2WWTr/src/validation.jl:111
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/2WWTr/src/driver.jl:319 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/PZq45/src/TimerOutput.jl:226 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/2WWTr/src/driver.jl:317 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2WWTr/src/utils.jl:62
  [6] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/mVgLI/src/compiler/execution.jl:313
  [7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2WWTr/src/cache.jl:89
  [8] cufunction(f::typeof(kernel), tt::Type{Tuple{CuDeviceVector{Float32, 1}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/mVgLI/src/compiler/execution.jl:288
  [9] cufunction(f::typeof(kernel), tt::Type{Tuple{CuDeviceVector{Float32, 1}}})
    @ CUDA ~/.julia/packages/CUDA/mVgLI/src/compiler/execution.jl:282
 [10] top-level scope
    @ ~/.julia/packages/CUDA/mVgLI/src/compiler/execution.jl:102
 [11] top-level scope
    @ ~/.julia/packages/CUDA/mVgLI/src/initialization.jl:52

My device and packages are the following:

┌ Info: System information:
│ CUDA toolkit 11.3.1, artifact installation
│ CUDA driver 11.3.0
│ NVIDIA driver 465.31.0
│ 
│ Libraries: 
│ - CUBLAS: 11.5.1
│ - CURAND: 10.2.4
│ - CUFFT: 10.4.2
│ - CUSOLVER: 11.1.2
│ - CUSPARSE: 11.6.0
│ - CUPTI: 14.0.0
│ - NVML: 11.0.0+465.31
│ - CUDNN: 8.20.0 (for CUDA 11.3.0)
│ - CUTENSOR: 1.3.0 (for CUDA 11.2.0)
│ 
│ Toolchain:
│ - Julia: 1.6.1
│ - LLVM: 12.0.0
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
│ - Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
│ 
│ 1 device:
└   0: NVIDIA GeForce RTX 2070 SUPER (sm_75, 7.176 GiB / 7.792 GiB available)

Could anyone give me a clue?

Thanks,

Gabriel

gatocor · June 16, 2021, 7:26pm

It seems that the following modification solves the problem.

using CUDA

function kernel(x)
      for i in 1:length(x)
           CUDA.atomic_add!(CUDA.pointer(x,1),Float32(1))
      end
      return
end

x = CUDA.zeros(4)
@cuda kernel(x)

But now I uncertain of the reproducibility of the results as it was kind of unexpected problem. Is it possible that in previous versions, CUDA.jl was in charge of promoting to the corresponding variables? Is someone could give me an idea, I would be very grateful.

maleadt · June 17, 2021, 7:27am

I don’t remember changing anything like that, so it was probably an unintended side effect of another change. Which version are you upgrading from?

It does seem like the atomic_add! function from Base also requires values to have the same type, so maybe we shouldn’t change it back. But an argument can be made for the @atomic macro to do automatic conversion; why aren’t you using that invocation anyway?

gatocor · June 17, 2021, 7:38am

Hi @maleadt,

I was upgrading from Julia 1.5 and CUDA 2.4.

The atomic macro with the above function was giving me the same problems, that is why I switched to the function version.

This code gives errors:

x = CUDA.zeros(3)
function f(x)
    for i in 1:length(x)
        @atomic x[1] += 1.
    end
    return nothing
end
@cuda f(x)
x

This code works:

x = CUDA.zeros(3)
function f(x)
    for i in 1:length(x)
        @atomic x[1] += Float32(1.)
    end
    return nothing
end
@cuda f(x)
x

The @atomic macro seems to return the appropriate pointer, but still, the kernel doesn’t promote the Float to the correct type and it has to be explicitly promoted.

maleadt · June 17, 2021, 9:41am

I know, that’s what I meant with “an argument can be made for”. I’ve implemented that suggestion in Perform type conversions in at-atomic. by maleadt · Pull Request #990 · JuliaGPU/CUDA.jl · GitHub.

Topic		Replies	Views
Cant compile CUDA v3.3.1 on PowerPC : TypeError in Atomic General Usage cuda	2	322	June 25, 2021
How to use `@atomic` with CUDA? New to Julia cuda	1	552	October 5, 2020
CUDA.@atomic causes type instability? GPU question	5	646	November 29, 2021
Atomic operations issue on StaticArrays with CUDAnative GPU	2	993	May 17, 2020
Problem with CUDAv3 GPU	9	922	November 8, 2021

Atomic operations stop working when upgrading to CUDA.jl version 3.3

Related topics