Enzyme.jl non-allocating function has allocating gradient

Hi,

I am taking Enzyme.jl for a spin.

using Enzyme,StaticArrays,BenchmarkTools,LinearAlgebra

A      = SVector(1.,2.,3.)
B      = SVector(2.,6.,2.)
foo(A) = dot(A,B) 
@btime gradient(Reverse,$foo,$A)

and get

  107.113 ns (7 allocations: 528 bytes)

My little test is representative of my usecase: find the gradient of scalar functions of a “small” StaticVector. The functions I want to differentiate use StaticVector and a functional style of programming: the idea is to not allocate on the heap, as the functions live somewhere in the innermost for loop.

Yet the gradient function compiled by Enzyme allocates, which is bad news. Is there anything I can do to prevent this allocation?

:slight_smile:

Yes. Try it the way explained in the enzyme docs:

julia> @btime autodiff(Reverse, dot, Active($A), Active($B))
  10.026 ns (0 allocations: 0 bytes)
(([2.0, 6.0, 2.0], [1.0, 2.0, 3.0]),)

# or if you only need dA

julia> @btime autodiff(Reverse, dot, Active($A), Const($B))
  4.763 ns (0 allocations: 0 bytes)
(([2.0, 6.0, 2.0], nothing),)

What probably makes this faster, too, is that I avoided the closure over B. Your function foo must store a reference to B - and B could change anytime, which makes it hard for the compiler to optimize.

2 Likes

And if you only use one argument, be aware of this known issue:

It’s relatively easy to fix, but no one has had the time yet (me included)

Thank you, so simple… :smile:

I guess I got lost in mutating vs. non-mutating function, vector vs. scalar, autodiff vs. gradient and “BatchDuplicate”.

Thank you for the kind help - and not least for Enzyme!

if you’re a bit confused by the terminology of a given autodiff package, you can always try to access it through DifferentiationInterface.jl.

Usual caveat: the native API of the autodiff package will sometimes be faster and/or work where DifferentiationInterface.jl fails, but if that’s the case please open an issue.