How to use gradient of neural network as the loss function?

You don’t need to modify much. There’s examples in the DiffEqFlux repo where the RHS of the ODE uses Zygote internally, and ReverseDiff for the VJP. It has to be that order though: the other fails.