I think I finally have a grasp on how Enzyme works, but I would like an expert to validate this.
Consider a possibly mutating vector function f(x, y) which outputs a scalar value z. We denote by x_a and y_a the contents of x and yafter execution. Enzyme works with a Jacobian including all of these variables (defined by blocks):
J = \begin{pmatrix}
\partial x_a / \partial x & \partial x_a / \partial y \\
\partial y_a / \partial x & \partial y_a / \partial y \\
\partial z / \partial x & \partial z / \partial y \\
\end{pmatrix}
In forward mode, if we give Duplicated arguments:
we provide a tuple of values (x, y) and their tangents v = (\dot{x}, \dot{y}) as shadows
I think you have the Jacobian products in step (2/3) backwards. Forwards-mode AD is equivalent to Jacobian–vector products Jv (it evaluates the chain rule from right-to-left, or equivalently from inputs-to-outputs). Backwards-mode AD is equivalent to vector–Jacobian products v^TJ (it evaluates the chain rule from left-to-right, or equivalently from outputs-to-inputs).
Do you have any clue as to the default seeding of the output adjoint in reverse mode Enzyme? It’s the part I’m really unsure about cause in ChainRules we give it explicitly to the VJP
That’s also why I asked specifically about Enzyme, because in ChainRules we don’t care about storage being updated unless it influences the output (so the x_a and y_a don’t really exist there)
Just write down the chain rule for computing f(x) = L(F(G(x)):
f'(x) = L'(F(G(x)) F'(G(x)) G'(x)
where each of the terms L', F', G' is a Jacobian matrix and x, F, G are vectors. Image that L is a loss function with a scalar output. Then the Jacobian L' is a row vector v^T where v = \nabla L. Reverse-mode multiplies (\nabla f)^T = L' F' G' = (v^T F') G' left-to-right, so that you are always doing vector–Jacobian products (never matrix–matrix products). (This is precisely the case where reverse-mode is best: many inputs x and a single scalar output L.)
I’m sorry, my question wasn’t clear. I’m used to computing VJPs and JVPs where you declare explicitly the whole vector v. My impression is that in Enzyme, that’s not the case:
in forward mode, the user does specify \dot{x} and \dot{y}
but in reverse mode they only specify \bar{x_a} and \bar{y_a}, which means the remaining component \bar{z} must come from somewhere: presumably it’s 1 (that’s what makes mathematical sense and also what my tests seem to suggest) but I’d love a confirmation cause I didn’t find anything about it in the docs
My minor clarification is that for reverse mode it doesn’t put \bar{x} into the storage you provide it +='s it into the storage you provide (which we refer to as shadow memory and/or the dval in the duplicated).
In combined reverse mode (the default) the default output adjoint is indeed one. This is related to why right now reverse-mode does not permit duplicated returns. The reason being that the augmented forward pass in reverse mode will return a shadow data structure for the return, which we will need to += intowith the output adjoint seed. For arbitrary data structures the question of how to do so is more fuzzy (since what if the output of the forward pass doesn’t match the user provided adjoint structure, since it will be runtime generated). For ease we presently just support Const or Active returns as a result.
However, if you want to seed with arbitrary values (including Duplicated returns like arrays, trees, etc) you have options, a couple of which I’ll put here.
Return by reference. Therefore you have a duplicated whre you would initialize the duplicated with the adjoint you want to propagate. Enzyme will propagate that value (and also zero it when it takes it out – this is necessary to ensure the proper functioning of loops since if you store to the same memory location twice, only the last stored value is the one you want to propagate to).
Use split reverse mode. First run the augmented forward pass, which returns a shadow data structure. += your adjoint into the shadow (which can be an array, matrix, linked list, etc).