How to differentiate using Enzyme.jl?

Hello,

I don’t really understand how Enzyme.jl actually works. I more or less understand that the Const arguments are not considered in the differentiation, and that Active and Duplicated are used otherwise. Duplicated allows to store the result to a preallocated variable.

Here a simple example

using Enzyme

function dudt!(du, u, p, t)
    du[1] = -p[1] * u[2]
    du[2] = p[2] * u[1]
    # return nothing
end

u = [1.0, 1.0]
du = similar(u)
p = [1.0, 1.0]
t = 0.0

d_p = Enzyme.make_zero(p)
my_f = (p) -> (dudt!(du, u, p, t); return du[2])
Enzyme.autodiff(Enzyme.Reverse, my_f, Active, Duplicated(p, d_p))

But here d_p is zero, which instead should be [0.0, 1.0] I think.

I also tried the same function but without the Active argument (actually I don’t know its purpose)

Enzyme.autodiff(Enzyme.Reverse, my_f, Duplicated(p, d_p))

but I get the error

ERROR: Duplicated Returns not yet handled
Stacktrace:
 [1] autodiff
   @ ~/.julia/packages/Enzyme/azJki/src/Enzyme.jl:397 [inlined]
 [2] autodiff
   @ ~/.julia/packages/Enzyme/azJki/src/Enzyme.jl:537 [inlined]
 [3] autodiff(mode::ReverseMode{false, false, FFIABI, false, false}, f::var"#15#16", args::Duplicated{Vector{Float64}})

Finally, what if I want to directly differentiate dudt! only with respect to p?

The direct Enzyme API likely isn’t the best tool for most people. I’d recommend using it through DifferentiationInterface.

For a longer discussion of what’s going on, see:

https://book.sciml.ai/notes/08-Forward-Mode_Automatic_Differentiation_(AD)_via_High_Dimensional_Algebras/

From this you can see that AD naturally works via Jacobian-vector products or, in the next leecture, vector-transpose Jacobian products. But the key point is for array-based AD, you have a direction for the derivative, i.e. Jv or J'v, defining v. In Enzyme, you’re defining v via the starting values in the shadow memory. So then:

This means you’re doing J'*[0,0] = 0, so it’s zero as expected. What did you want it to mean?

If you want to calculate the Jacobian, you need

J = [J'*[1,0]
       J'*[0,1]]

i.e. the derivative of each function with respect to each parameter is done by calling Enzyme.autodiff twice on the elementary vectors, each giving the derivative w.r.t. each parameter. J*[1,1] is then a directional derivative against the vector [1,1], etc. Jacobians are just the concatenation of derivatives along all basis directions. But if you’re doing to then do multiple calls, you can instead fuse them via BatchDuplicated in order to get more performance.

But again, the average person shouldn’t have to know any of this to actually use AD, so just use DifferentiationInterface.jl unless you need the extra little bits of performance.

3 Likes

Hi Alberto,

I believe the problem is that capturing globals can be an issue

F(du, u, p, t) = (dudt!(du, u, p, t); return du[2])

u = [1.0, 1.0]
du = similar(u)
p = [1.0, 1.0]
dp = Enzyme.make_zero(p)
t = 0.0
Enzyme.autodiff(Enzyme.Reverse, F, Active, Const(du), Const(u), Duplicated(p,dp), Const(t))
dp

I also tried the same function but without the Active argument (actually I don’t know its purpose)

Active means that you want to differentiate the function with respect to the return and dret = 1.0

You can also use Const and your original function dudt!

u = [1.0, 1.0]
du = similar(u)
ddu = Enzyme.make_zero(du)
ddu[2] = 1
p = [1.0, 1.0]
dp = Enzyme.make_zero(p)
t = 0.0
Enzyme.autodiff(Enzyme.Reverse, dudt!, Const, Duplicated(du, ddu), Const(u), Duplicated(p,dp), Const(t))
dp

By setting ddu[2] = 1 you are differentiating dudt! w.r.t du[2].

This is wrong. He wants to do dp = J'*du where du[2] == 1

I think this is coming from Error in `autodiff` involving `mul!` of `SparseMatrixCSC` · Issue #2099 · EnzymeAD/Enzyme.jl · GitHub which originally had a mistake in the reverse rule and there is an open PR to fix.

For the problem in the issue du is marked as Const which effectively zero’d the gradient as @ChrisRackauckas implied. If you mark du as Duplicated then you will get the gradient you expect as @vchuravy mentioned.

This is related to the activity of temporary variables and is explained in the Enzyme docs.

Are you sure? That contradicts what he said:

Which would seed the derivative into d_p and get the derivative out in d_du. I think what he’s actually asking for is

d_du .= 0
d_p = [0,1]
Enzyme.autodiff(Enzyme.Reverse, dudt!, Const, Duplicated(du, d_du), Const(u), Duplicated(p, d_p), Const(t))
# Derivative output is `d_du`, so `d_du[2] = `d(du[2])/dp[2]`

with the derivatives seeded into d_p and calculated into d_du. I’m pretty sure he’s only doing the my_f thing because the documentation shows how to do that for the gradient in its example. The doc example I think confuses quite a few people and I’ve wanted to change it.

I actually disagree here, if you want to use Enzyme I would explicitly avoid the use of DifferentiationInterface for the moment (presently it does not fully support Enzyme, as it does not forward things like runtime activity can lead to various pieces of code not working whereas calling Enzyme directly would succeed).

However, I would recommend the use of higher-level convenience functions like Enzyme.gradient and Enzyme.jacobian (Home · Enzyme.jl).

For a more fully featured description of how autodiff, and Enzyme specifically works, I would recommend this tutorial: Introduction to Automatic Differentiation in Enzyme

Here du is a temporary storage. Why don’t we need to put it as Duplicated too?

So the problem in the my_f function was that it was updating a global variable?

Actually my question should be simpler. I have a function that returns du[2] = p[2] * u[1], and I want to compute the gradient w.r.t. [p[1], p[2]] (i.e., (\partial / \partial p_1, \partial / \partial p_2)). It should return [0, u[1]], so [0, 1].

Actually I was expecting a method that directly returns the gradient [0, 1], without studying how du[i] and p[j] are related.

Thanks it seems very interesting and pedagogical.


So, @vchuravy’s examples work, but I don’t understand why do we use Const(du) in the first example, if it is a temporary variable.

To summarize, this is what I have apparently understood.

  • If the function returns a value, I should use Active and make Duplicated to al the arguments for which I want to compute the gradient (dp can also be zero, and it will be filled by the gradient value), or which are updated internally.
  • If the function doesn’t return anything, or I’m interested in the gradient of other quantities, I should use Const and make Duplicated to all the arguments for which I want to compute the gradient (or which are computed internally). I should put to 1 the element for which I want to compute the gradient. In my case I guess I should use BatchDuplicated(p, ([1,0], [0,1])) to compute the derivative w.r.t. p[1] and p[2], respectively.

Is everything correct?