I have been recently attracted to variational inference. I like the idea to link Bayesian modeling with optimization. I like its uni modal approximation to the parameters in a model (by assuming the variational posterior is a normal distribution). I have read the document on variational inference on the Turing website, https://turing.ml/dev/docs/for-developers/variational_inference. I try to understand the codes in https://github.com/TuringLang/AdvancedVI.jl/tree/master/src, but have difficult time with them.
Here is a simple scenario. Suppose I have a data set with 100 observations and 2 variables, x and y, where x is a continuous predictor and y is a binary output. I want to use a logistic regression with a single predictor x to predict y and use variational inference to estimate the distribution of parameter z for x. The prior of z is assumed to be a standard normal distribution.
@model logistic_regression(x,y,100) = begin intercept ~ Normal(0,1) z ~ Normal(0,1) for i = i:100 v = logistic(intercept + z*x[i]) y[i] ~ Bernoulli(v) end end;
According to the document above, we need to maximize ELBO(q) =
Σk=1 mΣi=1 n(log(p(xi,zk))/m + H(q(z)), in order to estimate parameters in a model. I want to understand how ELBO is calculated with the above model. I have not tried to understand the optimization part yet.
Please let me know if the following is right.
log(p(xi,zk)) = log(p(xi|zk)p(zk)) = InvLogit(Intercept+zk*xi)*exp(-zk2/2)/sqrt(2π), where xi is sampled from the data set, and zk is sampled from qμ,σ = N(μ,σ2).
In Turing, is log(p(xi,zk)) calculated using the two functions in https://github.com/TuringLang/Turing.jl/blob/master/src/variational/VariationalInference.jl?
function make_logjoint(model::Model; weight = 1.0) # setup ctx = DynamicPPL.MiniBatchContext( DynamicPPL.DefaultContext(), weight ) varinfo_init = Turing.VarInfo(model, ctx) function logπ(z) varinfo = VarInfo(varinfo_init, SampleFromUniform(), z) model(varinfo) return getlogp(varinfo) end return logπ end function logjoint(model::Model, varinfo, z) varinfo = VarInfo(varinfo, SampleFromUniform(), z) model(varinfo) return getlogp(varinfo) end
In https://github.com/TuringLang/Turing.jl/blob/master/src/variational/objectives.jl, the objective seems to be calculated using a function elbo,
function (elbo::ELBO)( rng::AbstractRNG, alg::VariationalInference, q, model::Model, num_samples; weight = 1.0, kwargs... ) return elbo(rng, alg, q, make_logjoint(model; weight = weight), num_samples; kwargs...) end
I do not understand how ELBO is calculated from these several lines of code .
The entropy part seems to be addressed in https://github.com/TuringLang/Turing.jl/blob/master/src/variational/advi.jl, right?
if q isa TransformedDistribution res += entropy(q.dist) else res += entropy(q) end
In sum, would someone give me some instruction to read the Turing code on variational inference?