Zygote documentation on custom adjoints states dx/dy
(in the notation from the docs) is the Jacobian when x
, y
are vectors. I am struggling in relating this to an implementation.
For a function f : R^N -> R^M
modeled with f = Chain(Dense(N, H, tanh), Dense(H, M))
with hidden dimension H
, I intuitively thought dy/dx
would be an MxN
matrix, but as we all know intuition often fails so I experimented a bit. Running
using Flux, Zygote
N, M, H = 2, 3, 16
g = Chain(Dense(N, H, tanh), Dense(H, M))
x = randn(N, 1)
y, back = Zygote.pullback(g, x)
back([])
gives DimensionMismatch("matrix A has dimensions (16,3), matrix B has dimensions (0,1)")
so it looks like dy/dx
is an MxH
matrix. Okay, cool, I guess this is the first matrix multiplication in the first back
call. But do I even get an MxN
matrix if I compute the back pass? No, running back(ones(size(y)...))
returns an Nx1
matrix. So how would I compute the “MxN” Jacobian?
I am clearly not thinking about this the right way. Could someone give me some pointers to this problem please? Reading the documentation has unfortunately not made it “click” for me.