Variational Inference of Mixture models?

I’m just cutting my teeth on Turing. (BTW, thanks for this amazing software.) Does Turing support VI of mixture models? I attempted advi against a gaussian mixture model, but without success as below. The error occurred at Z[ii] ~ Categorical(wtrue). Is it possible? Thanks!

using Distributions
using Turing, MCMCChains

# make causal distribution
ncomptrue = 3
mutrue = [1,2,3] # ground-truth component means
muprior = Normal(0, 3)
sigma = 0.1 # a-priori known 
nsamples = 100
# components have equal probability
wtrue = [1,1,1]/ncomptrue
mixtrue = MixtureModel(Normal, [(mm, sigma) for mm in mutrue], wtrue)
Y = rand(mixtrue, nsamples)

# fit distribution with known # components
@model Mixmodel(x) = begin
    N = length(x)
    mu1 ~ muprior
    mu2 ~ muprior
    mu3 ~ muprior
    mu = [mu1, mu2, mu3]
    Z = Vector{Int}(undef, N)
    for ii in 1:N
        Z[ii] ~ Categorical(wtrue)
        x[ii] ~ Normal(mu[Z[ii]], sigma)
    end
end

model = Mixmodel(Y)

advi = ADVI(10, 1000)
q = vi(model, advi)

Hi, you can use variational inference only for continuous parameters. In variational inference, the goal is to minimise the KL-divergence from the variational distribution to the true posterior. For this, we use gradient based optimisation in Turing, which requires the computation of gradients.

However, you can reformulate a mixture model such that it contains only continuous parameters.

The following model runs using ADVI.

@model function gmm(x, K)
    N = length(x)
    μ ~ filldist(Normal(0,1), K)
    w ~ Dirichlet(K, 1)
    for i in 1:N
        x[i] ~ MixtureModel(Normal, μ, w)
    end
end

Alternatively, you could also use the arraydist function, i.e.

@model function gmm(x, K)
    N = length(x)
    μ ~ filldist(Normal(0,1), K)
    w ~ Dirichlet(K, 1)
    x ~ arraydist(map(i -> MixtureModel(Normal, μ, w), 1:N))
end

Note that you might want to change the AD backend if you have a model with a lot of parameters. In this case it is probably not necessary, but for more involved models you might want to keep this in mind and possibly switch to a reverse mode AD. You can find details about this in the documentation of Turing.

filldist(MixtureModel(Normal, μ, w), N) is better in this case since the distribution is the same for all elements.

1 Like

Yes of course! Thank you very much!

One follow on question. If you could please explain why K is required in the function signature. Consider the two nearly-identical models (the only difference being whether K is given as a keyword argument.)

@model gmm(x, K=3) = begin
    N = length(x)
    μ ~ filldist(Normal(0,1), K)
    w ~ Dirichlet(K, 1)
    for ii in 1:N
        x[ii] ~ MixtureModel(Normal, μ, w)
    end
end

@model gmm2(x) = begin
    K = 3
    N = length(x)
    μ = filldist(Normal(0,1), K)
    w ~ Dirichlet(K, 1)
    for ii in 1:N
        x[ii] ~ MixtureModel(Normal, μ, w)
    end
end

The following code executes successfully

model = gmm(Y)
advi = ADVI(10, 1000)
q = vi(model, advi)

Changing the first line above to model=gmm2(Y) gives an error. How can this be when gmm(Y) and gmm2(Y) appear to be functionally equivalent?

This should not make any difference. I cannot try it atm, but have you observed the same behaviour on a fresh Julia session?

This should be μ ~ filldist(Normal(0,1), K).

1 Like

:flushed: Yep; that was the problem…

For people stumbling into this post at a later moment. I wrote an extensive blog post about exactly this model; partly my post was based on this thread as well: Bayesian Latent Profile Analysis (mixture modeling) - Huijzer.xyz

In the post, I show that mixture models are very tricky to get right due to the label identification problem. To fix this, the post uses an ordered constraint like the ordered prior which is available in Stan.

5 Likes