I’m just cutting my teeth on Turing. (BTW, thanks for this amazing software.) Does Turing support VI of mixture models? I attempted advi against a gaussian mixture model, but without success as below. The error occurred at Z[ii] ~ Categorical(wtrue). Is it possible? Thanks!

using Distributions
using Turing, MCMCChains
# make causal distribution
ncomptrue = 3
mutrue = [1,2,3] # ground-truth component means
muprior = Normal(0, 3)
sigma = 0.1 # a-priori known
nsamples = 100
# components have equal probability
wtrue = [1,1,1]/ncomptrue
mixtrue = MixtureModel(Normal, [(mm, sigma) for mm in mutrue], wtrue)
Y = rand(mixtrue, nsamples)
# fit distribution with known # components
@model Mixmodel(x) = begin
N = length(x)
mu1 ~ muprior
mu2 ~ muprior
mu3 ~ muprior
mu = [mu1, mu2, mu3]
Z = Vector{Int}(undef, N)
for ii in 1:N
Z[ii] ~ Categorical(wtrue)
x[ii] ~ Normal(mu[Z[ii]], sigma)
end
end
model = Mixmodel(Y)
advi = ADVI(10, 1000)
q = vi(model, advi)

Hi, you can use variational inference only for continuous parameters. In variational inference, the goal is to minimise the KL-divergence from the variational distribution to the true posterior. For this, we use gradient based optimisation in Turing, which requires the computation of gradients.

However, you can reformulate a mixture model such that it contains only continuous parameters.

The following model runs using ADVI.

@model function gmm(x, K)
N = length(x)
μ ~ filldist(Normal(0,1), K)
w ~ Dirichlet(K, 1)
for i in 1:N
x[i] ~ MixtureModel(Normal, μ, w)
end
end

Alternatively, you could also use the arraydist function, i.e.

@model function gmm(x, K)
N = length(x)
μ ~ filldist(Normal(0,1), K)
w ~ Dirichlet(K, 1)
x ~ arraydist(map(i -> MixtureModel(Normal, μ, w), 1:N))
end

Note that you might want to change the AD backend if you have a model with a lot of parameters. In this case it is probably not necessary, but for more involved models you might want to keep this in mind and possibly switch to a reverse mode AD. You can find details about this in the documentation of Turing.

One follow on question. If you could please explain why K is required in the function signature. Consider the two nearly-identical models (the only difference being whether K is given as a keyword argument.)

@model gmm(x, K=3) = begin
N = length(x)
μ ~ filldist(Normal(0,1), K)
w ~ Dirichlet(K, 1)
for ii in 1:N
x[ii] ~ MixtureModel(Normal, μ, w)
end
end
@model gmm2(x) = begin
K = 3
N = length(x)
μ = filldist(Normal(0,1), K)
w ~ Dirichlet(K, 1)
for ii in 1:N
x[ii] ~ MixtureModel(Normal, μ, w)
end
end

In the post, I show that mixture models are very tricky to get right due to the label identification problem. To fix this, the post uses an ordered constraint like the ordered prior which is available in Stan.