using Enzyme,StaticArrays,BenchmarkTools,LinearAlgebra
A = SVector(1.,2.,3.)
B = SVector(2.,6.,2.)
foo(A) = dot(A,B)
@btime gradient(Reverse,$foo,$A)
and get
107.113 ns (7 allocations: 528 bytes)
My little test is representative of my usecase: find the gradient of scalar functions of a “small” StaticVector. The functions I want to differentiate use StaticVector and a functional style of programming: the idea is to not allocate on the heap, as the functions live somewhere in the innermost for loop.
Yet the gradient function compiled by Enzyme allocates, which is bad news. Is there anything I can do to prevent this allocation?
julia> @btime autodiff(Reverse, dot, Active($A), Active($B))
10.026 ns (0 allocations: 0 bytes)
(([2.0, 6.0, 2.0], [1.0, 2.0, 3.0]),)
# or if you only need dA
julia> @btime autodiff(Reverse, dot, Active($A), Const($B))
4.763 ns (0 allocations: 0 bytes)
(([2.0, 6.0, 2.0], nothing),)
What probably makes this faster, too, is that I avoided the closure over B. Your function foo must store a reference to B - and B could change anytime, which makes it hard for the compiler to optimize.
if you’re a bit confused by the terminology of a given autodiff package, you can always try to access it through DifferentiationInterface.jl.
Usual caveat: the native API of the autodiff package will sometimes be faster and/or work where DifferentiationInterface.jl fails, but if that’s the case please open an issue.