Do I understand Enzyme properly?

I think I finally have a grasp on how Enzyme works, but I would like an expert to validate this.

Consider a possibly mutating vector function f(x, y) which outputs a scalar value z. We denote by x_a and y_a the contents of x and y after execution. Enzyme works with a Jacobian including all of these variables (defined by blocks):

J = \begin{pmatrix} \partial x_a / \partial x & \partial x_a / \partial y \\ \partial y_a / \partial x & \partial y_a / \partial y \\ \partial z / \partial x & \partial z / \partial y \\ \end{pmatrix}

In forward mode, if we give Duplicated arguments:

  1. we provide a tuple of values (x, y) and their tangents v = (\dot{x}, \dot{y}) as shadows
  2. autodiff computes Jv = (\dot{x_a}, \dot{y_a}, \dot{z})
  3. it puts \dot{x_a} and \dot{y_a} in the storage we provided (in the shadows)
  4. it returns \dot{z}

In reverse mode, if we give Duplicated arguments:

  1. we provide a tuple of values (x, y) and their adjoints v = (\bar{x_a}, \bar{y_a}) as shadows
  2. … which is completed with a default output adjoint \bar{z} = 1 (I guess?)
  3. autodiff computes v^\top J = (\bar{x}, \bar{y})
  4. it puts \bar{x} and \bar{y} in the storage we provided (in the shadows)
  5. it returns nothing

Do I have that right? I think such a description with the full Jacobian would have helped me in the documentation (coming from ChainRules)

Sources:

5 Likes

I think you have the Jacobian products in step (2/3) backwards. Forwards-mode AD is equivalent to Jacobian–vector products Jv (it evaluates the chain rule from right-to-left, or equivalently from inputs-to-outputs). Backwards-mode AD is equivalent to vector–Jacobian products v^TJ (it evaluates the chain rule from left-to-right, or equivalently from outputs-to-inputs).

This video on principles of AD by @mohamed82008 may be helpful.

3 Likes

Thanks for spotting the switch, I corrected it!

Do you have any clue as to the default seeding of the output adjoint in reverse mode Enzyme? It’s the part I’m really unsure about cause in ChainRules we give it explicitly to the VJP

That’s also why I asked specifically about Enzyme, because in ChainRules we don’t care about storage being updated unless it influences the output (so the x_a and y_a don’t really exist there)

Just write down the chain rule for computing f(x) = L(F(G(x)):

f'(x) = L'(F(G(x)) F'(G(x)) G'(x)

where each of the terms L', F', G' is a Jacobian matrix and x, F, G are vectors. Image that L is a loss function with a scalar output. Then the Jacobian L' is a row vector v^T where v = \nabla L. Reverse-mode multiplies (\nabla f)^T = L' F' G' = (v^T F') G' left-to-right, so that you are always doing vector–Jacobian products (never matrix–matrix products). (This is precisely the case where reverse-mode is best: many inputs x and a single scalar output L.)

We have a short course Matrix Calculus at MIT that covers this and more: GitHub - mitmath/matrixcalc: MIT IAP short course: Matrix Calculus for Machine Learning and Beyond

4 Likes

I’m sorry, my question wasn’t clear. I’m used to computing VJPs and JVPs where you declare explicitly the whole vector v. My impression is that in Enzyme, that’s not the case:

  • in forward mode, the user does specify \dot{x} and \dot{y}
  • but in reverse mode they only specify \bar{x_a} and \bar{y_a}, which means the remaining component \bar{z} must come from somewhere: presumably it’s 1 (that’s what makes mathematical sense and also what my tests seem to suggest) but I’d love a confirmation cause I didn’t find anything about it in the docs
1 Like

By the way, I don’t know if Alan told you but I wrote a new homework last year on autodiff with ChainRules for his Julia class, it may be useful for this one too!
https://gdalle.github.io/JuliaComputationSolutions/hw2_solutions.html

1 Like

@gdalle your understanding looks mostly correct.

My minor clarification is that for reverse mode it doesn’t put \bar{x} into the storage you provide it +='s it into the storage you provide (which we refer to as shadow memory and/or the dval in the duplicated).

In combined reverse mode (the default) the default output adjoint is indeed one. This is related to why right now reverse-mode does not permit duplicated returns. The reason being that the augmented forward pass in reverse mode will return a shadow data structure for the return, which we will need to += intowith the output adjoint seed. For arbitrary data structures the question of how to do so is more fuzzy (since what if the output of the forward pass doesn’t match the user provided adjoint structure, since it will be runtime generated). For ease we presently just support Const or Active returns as a result.

However, if you want to seed with arbitrary values (including Duplicated returns like arrays, trees, etc) you have options, a couple of which I’ll put here.

  1. Return by reference. Therefore you have a duplicated whre you would initialize the duplicated with the adjoint you want to propagate. Enzyme will propagate that value (and also zero it when it takes it out – this is necessary to ensure the proper functioning of loops since if you store to the same memory location twice, only the last stored value is the one you want to propagate to).
  2. Use split reverse mode. First run the augmented forward pass, which returns a shadow data structure. += your adjoint into the shadow (which can be an array, matrix, linked list, etc).
4 Likes

Thanks @wsmoses, I’ll take time to digest your answer (which is the one I was looking for) and come back if I have more questions :smiling_imp:

Special shoutout to @stevengj, who seems to answer every single question on this forum without ever losing patience :pray:

4 Likes