Enzyme.jl non-allocating function has allocating gradient

Philippe_Maincon1 · November 28, 2024, 1:33pm

Hi,

I am taking Enzyme.jl for a spin.

using Enzyme,StaticArrays,BenchmarkTools,LinearAlgebra

A      = SVector(1.,2.,3.)
B      = SVector(2.,6.,2.)
foo(A) = dot(A,B) 
@btime gradient(Reverse,$foo,$A)

and get

  107.113 ns (7 allocations: 528 bytes)

My little test is representative of my usecase: find the gradient of scalar functions of a “small” StaticVector. The functions I want to differentiate use StaticVector and a functional style of programming: the idea is to not allocate on the heap, as the functions live somewhere in the innermost for loop.

Yet the gradient function compiled by Enzyme allocates, which is bad news. Is there anything I can do to prevent this allocation?

maxfreu · November 28, 2024, 2:05pm

Yes. Try it the way explained in the enzyme docs:

julia> @btime autodiff(Reverse, dot, Active($A), Active($B))
  10.026 ns (0 allocations: 0 bytes)
(([2.0, 6.0, 2.0], [1.0, 2.0, 3.0]),)

# or if you only need dA

julia> @btime autodiff(Reverse, dot, Active($A), Const($B))
  4.763 ns (0 allocations: 0 bytes)
(([2.0, 6.0, 2.0], nothing),)

What probably makes this faster, too, is that I avoided the closure over B. Your function foo must store a reference to B - and B could change anytime, which makes it hard for the compiler to optimize.

gdalle · November 28, 2024, 2:48pm

And if you only use one argument, be aware of this known issue:

github.com/EnzymeAD/Enzyme.jl

`Enzyme.gradient` allocates on `SVector`

opened 06:27AM - 16 Oct 24 UTC

gdalle

Hi! As you know, @ExpandingMan and I are looking to optimize performance for Sta…ticArrays. Forward mode works splendidly, but reverse mode still makes one allocation during the `gradient` call: ```julia using StaticArrays, Enzyme, BenchmarkTools f(x) = sum(abs2, x); x = SVector(1.0, 2.0); @btime Enzyme.gradient(Enzyme.Reverse, f, $x) # 8.999 ns (1 allocation: 32 bytes) @btime Enzyme.autodiff(Enzyme.Reverse, f, Enzyme.Active($x)) # 4.218 ns (0 allocations: 0 bytes) ``` I found it surprising because Enzyme guesses the right activity for `SVector`: ```julia Enzyme.guess_activity(typeof(x), Enzyme.Reverse) # Active{SVector{2, Float64}} ``` The allocation happens on the following line: https://github.com/EnzymeAD/Enzyme.jl/blob/42ecd12cf5076f8d3db1694e014f69bc0b99173f/src/Enzyme.jl#L1708 From what I understand, the generated function `Enzyme.gradient` puts a `Ref` there to treat every argument as `(Mixed)Duplicated`. This means that all gradient results are stored in the passed arguments: https://github.com/EnzymeAD/Enzyme.jl/blob/42ecd12cf5076f8d3db1694e014f69bc0b99173f/src/Enzyme.jl#L1741 Otherwise, you would have to recover some gradients from the result and others from the arguments, which is understandably tricky. Do you think there is an easy fix in Enzyme? Otherwise, since DI only has one differentiated argument, I assume it will be rather straightfoward to call `Enzyme.autodiff` directly inside `DI.gradient` and recover allocation-free behavior. Related: - https://github.com/gdalle/DifferentiationInterface.jl/issues/583

It’s relatively easy to fix, but no one has had the time yet (me included)

Philippe_Maincon1 · November 29, 2024, 2:53pm

Thank you, so simple…

I guess I got lost in mutating vs. non-mutating function, vector vs. scalar, autodiff vs. gradient and “BatchDuplicate”.

Thank you for the kind help - and not least for Enzyme!

gdalle · November 30, 2024, 7:42am

if you’re a bit confused by the terminology of a given autodiff package, you can always try to access it through DifferentiationInterface.jl.

Usual caveat: the native API of the autodiff package will sometimes be faster and/or work where DifferentiationInterface.jl fails, but if that’s the case please open an issue.

wsmoses · December 2, 2024, 5:45am

Since it looks like there hasn’t been progress on that issue in the last month, I went ahead and implemented the improvement directly (and tagged a version just now which ought include it).

So hopefully you should be able to just use Enzyme.gradient and it will do the desired call

And yeah like @gdalle mentioned while DI is another option, I’d strongly recommend using Enzyme directly whenever possible (it is often faster, and is compatible with the most code – for example as DI doesn’t presently support Enzyme’s runtime activity DI can end up giving incorrect results or sometimes crashing). That said @gdalle also note that I released a version of EnzymeCore the other day which has the API you need, feel free to ping me for help integrating.

gdalle · December 2, 2024, 6:00am

Thanks! I’ll get to it

Philippe_Maincon1 · December 2, 2024, 6:37am

Thank you guys!

Topic		Replies	Views
Enzyme autodiff: Why am I getting allocations? Machine Learning enzyme	20	878	January 29, 2024
Zygote and StaticArrays Performance zygote , staticarrays	12	1225	September 5, 2022
How to differentiate using Enzyme.jl? General Usage autodiff , enzyme	7	483	November 18, 2024
Enzyme.jl Critical Issues with Basic Arithmetic Operations in Automatic Differentiation Numerics autodiff , enzyme	7	296	May 26, 2025
Avoiding memory allocation when repeatedly using small arrays Performance	5	494	July 25, 2021

Enzyme.jl non-allocating function has allocating gradient

Related topics