Hi everyone,
I’ve been trying to auto differentiate some code that runs on the GPU using ForwardDiff. Some of the kernels perform an atomic addition operation via CUDA.@atomic
, but that doesn’t seem to be supported for ForwardDiff.Dual
types. Is there a way to define that behavior myself? Would it even be possible? The functionality I’m looking for is similar to what was offered by this old CUDAatomics code found here: alandion/CUDAatomics.jl (github.com), except the partial
field contains a ForwardDiff.Partial
and not just a Float32
. If it’s possible but requires some work, I am willing to look into how to do it and contribute if others are also interested, although it may take a while for me to learn the ropes as I am still quite new to CUDA.jl.
Here’s a sample toy problem (which I am aware can be done without the atomic operation):
using ForwardDiff
using CUDA
function test_kernel!(A, x, pows, nmax)
i = (blockIdx().x - 1) * blockDim().x + threadIdx().x
if i <= nmax
#CUDA.@atomic A[1] += x[i]^pows[i]
CUDA.atomic_add!(pointer(A, 1), x[i]^pows[i])
end
return
end
function objective_func(x::Vector{T}) where {T<:Real}
nmax = 5
x_gpu = cu(x)
res = CUDA.zeros(T, 1)
pows = CuArray{T}([1.0;2.0;3.0;4.0;5.0])
nthreads = 5
nbx = ceil(Int, nmax/nthreads)
CUDA.@sync begin
@cuda threads=nthreads blocks=nbx test_kernel!(res, x_gpu, pows, nmax)
end
return Array(res)[1]
end
x_test = ones(Float32, 5)
println("Function Eval with Floats:")
display(objective_func(x_test))
println("Gradient Eval with Duals:")
display(ForwardDiff.gradient(objective_func, x_test))
Executing the code produces the following error:
ERROR: InvalidIRError: compiling kernel #test_kernel!(CuDeviceVector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(objective_func), Float32}, Float32, 5}, 1}, CuDeviceVector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(objective_func), Float32}, Float32, 5}, 1}, CuDeviceVector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(objective_func), Float32}, Float32, 5}, 1}, Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to atomic_add!)
Thanks!