KernelAbstractions + Enzyme - how to do GPU-side autodiff?

jl_enthusiast · September 24, 2025, 11:07pm

Hello,

I recently tried to (and failed) to auto-differentiate kernels with Enzyme.jl, written using KernelAbstractions.jl.

I tried to search for examples in other packages, but could not find any concrete examples of Enzyme + KA being used together (the Enzyme test cases are written in CUDA/specific backends Enzyme.jl/test at main · EnzymeAD/Enzyme.jl · GitHub) and as such have no idea how it might look for KA - I assume I would need to launch a separate kernel, with a deferred autodiff call inside?

My current (very simple) attempt using KA was

using KernelAbstractions
using Enzyme

@kernel function cos_kernel_ka!(B, A)
    i = @index(Global, Linear)
    B[i] = sin(A[i])
end
@kernel function grad_cos_kernel_ka!(dA, dB, A, B)
    Enzyme.autodiff_deferred(
        Reverse,
        Const(cos_kernel_ka!),
        Const,
        Duplicated(B, dB),
        Duplicated(A, dA)
    )
end

Which results in a ERROR: AssertionError: actualRetType != Union{} on Metal/CUDA backends when I attempt to launch it, so I strongly suspect I just have a syntax error.

Appreciate any help in advance!

yolhan_mannes · September 25, 2025, 6:34am

Why you make your gradient a kernel ? Kernel only supports standard operations for now.
In your case the autodiff will make a kernel in a kernel which isn’t supported right now.
Also thats not how the kernel is run. However I don’t what you want to calculate :

julia> using KernelAbstractions,Enzyme,CUDA
julia> @kernel function cos_kernel_ka!(B, A)
           i = @index(Global, Linear)
           B[i] = sin(A[i])
       end

julia> function cos_run!(B,A)
           cos_kernel_ka!(get_backend(B))(B,A;ndrange=length(B))
           B
       end
cos_run! (generic function with 1 method)

julia> function grad_cos_kernel_ka!(dA, dB, A, B)
           Enzyme.autodiff_deferred(
               Reverse,
               Const(cos_run!),
               Active,
               DuplicatedNoNeed(B, dB),
               Duplicated(A, dA)
           )
       end
grad_cos_kernel_ka! (generic function with 1 method)

julia> A = CUDA.rand(100);

julia> B = CUDA.rand(100);

julia> dA = Enzyme.make_zero(A);

julia> dB = Enzyme.make_zero(B);

julia> grad_cos_kernel_ka!(dA,dB,A,B)
ERROR: Return type of differentiated function was not a scalar as required, found Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
If calling Enzyme.autodiff(Reverse, f, Active, ...), try Enzyme.autodiff_thunk(Reverse, f, Duplicated, ....)
If calling Enzyme.gradient, try Enzyme.jacobian

this works :

julia> using KernelAbstractions,Enzyme,CUDA

julia> A = CUDA.rand(100);

julia> B = CUDA.rand(100);

julia> dA = Enzyme.make_zero(A);

julia> dB = Enzyme.make_zero(B);

julia> @kernel function cos_kernel_ka!(B, A)
           i = @index(Global, Linear)
           B[i] = sin(A[i])
       end
cos_kernel_ka! (generic function with 4 methods)

julia> function cos_run!(B,A)
           cos_kernel_ka!(get_backend(B))(B,A;ndrange=length(B))
           sum(B)
       end
cos_run! (generic function with 1 method)

julia> function grad_cos_kernel_ka!(dA, dB, A, B)
          Enzyme.autodiff_deferred(
               Reverse,
               Const(cos_run!),
               Active,
               DuplicatedNoNeed(B, dB),
               Duplicated(A, dA)
           )
       end
grad_cos_kernel_ka! (generic function with 1 method)

julia> grad_cos_kernel_ka!(dA,dB,A,B)
((nothing, nothing),)

julia> dA
100-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.573843
 0.68915915
 0.9999656
 0.9991022
 0.92152673
 0.5781501
 0.97622705
 0.9257504
 0.94532764
 0.85892415
 0.7805012
 0.7845093
 0.9911921
 ⋮
 0.8982199
 0.98546714
 0.91434866
 0.9942696
 0.8421704
 0.71289515
 0.9669826
 0.9899486
 0.9999464
 0.9949909
 0.7747586
 0.98301524

which is normal actually what did you want to calculate ? If you want the jacobian or the v^T J product, reffer to Enzyme docs or the implementation in jacobian or just use jacobian itself

ps : depending on what you want don’t forget to zeros dA and dB

function grad_cos_kernel_ka!(dA, dB, A, B)
           Enzyme.make_zero!(dA)
           Enzyme.make_zero!(dB)
           Enzyme.autodiff_deferred(
               Reverse,
               Const(cos_run!),
               Active,
               DuplicatedNoNeed(B, dB),
               Duplicated(A, dA)
           )
       end

ps 2 : you can do this

julia> using KernelAbstractions,Enzyme,CUDA

julia> A = CUDA.rand(100);

julia> B = CUDA.rand(100);

julia> dA = Enzyme.make_zero(A);

julia> dB = Enzyme.make_zero(B);

julia> @kernel function cos_kernel_ka!(B, A)
           i = @index(Global, Linear)
           B[i] = sin(A[i])
       end

julia> function cos_run!(B,A)
           cos_kernel_ka!(get_backend(B))(B,A;ndrange=length(B))
           nothing
       end
cos_run! (generic function with 1 method)

julia> function grad_cos_kernel_ka!(dA, dB, A, B)
           Enzyme.autodiff_deferred(
               Reverse,
               Const(cos_run!),
               Const,
               DuplicatedNoNeed(B, dB),
               Duplicated(A, dA)
           )
       end
grad_cos_kernel_ka! (generic function with 1 method)

julia> dB .= 1;

julia> grad_cos_kernel_ka!(dA,dB,A,B)
((nothing, nothing),)

julia> dA ≈ cos.(A)
true

Topic		Replies	Views
Auto-diff Friendly GPU Stencils GPU	3	838	January 2, 2023
Custom backpropagation rule on GPU GPU	2	429	August 29, 2022
What is the correct way to autodiff a simple CUDA kernel using Reactant.jl and Enzyme.jl Machine Learning question , package , autodiff , reactant	20	302	September 23, 2025
Using Enzyme with DE.jl General Usage ad , differentialequation , enzyme	3	755	March 10, 2023
Enzyme.jl plus parallelstencil.jl? GPU	5	618	August 22, 2022

KernelAbstractions + Enzyme - how to do GPU-side autodiff?

Related topics