I would like to use Zygote, because of the amazing possibility to tag the parameters you want to optimize and let Zygote do the rest.
However in my case I am creating a matrix (
K) given a (potentially highly nested) list of parameters (
theta) and passing them to a function (
f) to get a scalar.
I have derived analytically the gradient
df/dtheta = g(dK/dtheta) which is non-linear (and contains a share of optimization tricks) and Zygote works perfectly for differentiating
K. My initial solution was to compute
dK/dp, for each
theta and pass it to
df/dtheta but it is very inefficient/unpractical.
Now I want to write an
@adjoint that would contain
g(K) but I have no idea how to go about it since it’s not a jacobian-vector product anymore…
Here is a simple example with the derivations
How can I write an appropriate adjoint for this?