Why you make your gradient a kernel ? Kernel only supports standard operations for now.
In your case the autodiff will make a kernel in a kernel which isn’t supported right now.
Also thats not how the kernel is run. However I don’t what you want to calculate :
julia> using KernelAbstractions,Enzyme,CUDA
julia> @kernel function cos_kernel_ka!(B, A)
i = @index(Global, Linear)
B[i] = sin(A[i])
end
julia> function cos_run!(B,A)
cos_kernel_ka!(get_backend(B))(B,A;ndrange=length(B))
B
end
cos_run! (generic function with 1 method)
julia> function grad_cos_kernel_ka!(dA, dB, A, B)
Enzyme.autodiff_deferred(
Reverse,
Const(cos_run!),
Active,
DuplicatedNoNeed(B, dB),
Duplicated(A, dA)
)
end
grad_cos_kernel_ka! (generic function with 1 method)
julia> A = CUDA.rand(100);
julia> B = CUDA.rand(100);
julia> dA = Enzyme.make_zero(A);
julia> dB = Enzyme.make_zero(B);
julia> grad_cos_kernel_ka!(dA,dB,A,B)
ERROR: Return type of differentiated function was not a scalar as required, found Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
If calling Enzyme.autodiff(Reverse, f, Active, ...), try Enzyme.autodiff_thunk(Reverse, f, Duplicated, ....)
If calling Enzyme.gradient, try Enzyme.jacobian
this works :
julia> using KernelAbstractions,Enzyme,CUDA
julia> A = CUDA.rand(100);
julia> B = CUDA.rand(100);
julia> dA = Enzyme.make_zero(A);
julia> dB = Enzyme.make_zero(B);
julia> @kernel function cos_kernel_ka!(B, A)
i = @index(Global, Linear)
B[i] = sin(A[i])
end
cos_kernel_ka! (generic function with 4 methods)
julia> function cos_run!(B,A)
cos_kernel_ka!(get_backend(B))(B,A;ndrange=length(B))
sum(B)
end
cos_run! (generic function with 1 method)
julia> function grad_cos_kernel_ka!(dA, dB, A, B)
Enzyme.autodiff_deferred(
Reverse,
Const(cos_run!),
Active,
DuplicatedNoNeed(B, dB),
Duplicated(A, dA)
)
end
grad_cos_kernel_ka! (generic function with 1 method)
julia> grad_cos_kernel_ka!(dA,dB,A,B)
((nothing, nothing),)
julia> dA
100-element CuArray{Float32, 1, CUDA.DeviceMemory}:
0.573843
0.68915915
0.9999656
0.9991022
0.92152673
0.5781501
0.97622705
0.9257504
0.94532764
0.85892415
0.7805012
0.7845093
0.9911921
⋮
0.8982199
0.98546714
0.91434866
0.9942696
0.8421704
0.71289515
0.9669826
0.9899486
0.9999464
0.9949909
0.7747586
0.98301524
which is normal actually what did you want to calculate ? If you want the jacobian or the v^T J product, reffer to Enzyme docs or the implementation in jacobian or just use jacobian itself
ps : depending on what you want don’t forget to zeros dA and dB
function grad_cos_kernel_ka!(dA, dB, A, B)
Enzyme.make_zero!(dA)
Enzyme.make_zero!(dB)
Enzyme.autodiff_deferred(
Reverse,
Const(cos_run!),
Active,
DuplicatedNoNeed(B, dB),
Duplicated(A, dA)
)
end
ps 2 : you can do this
julia> using KernelAbstractions,Enzyme,CUDA
julia> A = CUDA.rand(100);
julia> B = CUDA.rand(100);
julia> dA = Enzyme.make_zero(A);
julia> dB = Enzyme.make_zero(B);
julia> @kernel function cos_kernel_ka!(B, A)
i = @index(Global, Linear)
B[i] = sin(A[i])
end
julia> function cos_run!(B,A)
cos_kernel_ka!(get_backend(B))(B,A;ndrange=length(B))
nothing
end
cos_run! (generic function with 1 method)
julia> function grad_cos_kernel_ka!(dA, dB, A, B)
Enzyme.autodiff_deferred(
Reverse,
Const(cos_run!),
Const,
DuplicatedNoNeed(B, dB),
Duplicated(A, dA)
)
end
grad_cos_kernel_ka! (generic function with 1 method)
julia> dB .= 1;
julia> grad_cos_kernel_ka!(dA,dB,A,B)
((nothing, nothing),)
julia> dA ≈ cos.(A)
true