Hey!
I am trying to train a last-layer bayesian approximation for a neural network. The issue I am running in to is that ADVI tries to take the gradient of the entire dataset… obviously not going to work.
Can the ADVI algorithm be tweaked to calculate the gradient on a minibatch, presumably by scaling the output by the size of the total dataset?
This feature is built in to pymc but i really am not a fan of the syntax.