Hi I am trying to reproduce an algorithm based on Natural gradient computations (Natural Gradients in Practice - Salimbeni et al 2018).

The key computation is that you can get rid of the inverse Fisher Information matrix by replacing it with transformation derivatives, here is the relevant extract ;

The whole talk about reverse-mode relevant, as forward mode is available in Julia. However I was wondering if there was a way to compute this quantity in one pass instead of computing first \frac{\partial \xi}{\partial \theta} and then \frac{\partial \mathcal{L}}{\partial \eta}.

For precision here \xi are an arbitrary representation of the variational parameters, \theta are the natural parameters and \eta are the expectation parameters (first and second moment)