Variational Inference of Mixture models?

marty0801 · May 23, 2020, 4:33pm

I’m just cutting my teeth on Turing. (BTW, thanks for this amazing software.) Does Turing support VI of mixture models? I attempted advi against a gaussian mixture model, but without success as below. The error occurred at Z[ii] ~ Categorical(wtrue). Is it possible? Thanks!

using Distributions
using Turing, MCMCChains

# make causal distribution
ncomptrue = 3
mutrue = [1,2,3] # ground-truth component means
muprior = Normal(0, 3)
sigma = 0.1 # a-priori known 
nsamples = 100
# components have equal probability
wtrue = [1,1,1]/ncomptrue
mixtrue = MixtureModel(Normal, [(mm, sigma) for mm in mutrue], wtrue)
Y = rand(mixtrue, nsamples)

# fit distribution with known # components
@model Mixmodel(x) = begin
    N = length(x)
    mu1 ~ muprior
    mu2 ~ muprior
    mu3 ~ muprior
    mu = [mu1, mu2, mu3]
    Z = Vector{Int}(undef, N)
    for ii in 1:N
        Z[ii] ~ Categorical(wtrue)
        x[ii] ~ Normal(mu[Z[ii]], sigma)
    end
end

model = Mixmodel(Y)

advi = ADVI(10, 1000)
q = vi(model, advi)

trappmartin · May 23, 2020, 6:12pm

Hi, you can use variational inference only for continuous parameters. In variational inference, the goal is to minimise the KL-divergence from the variational distribution to the true posterior. For this, we use gradient based optimisation in Turing, which requires the computation of gradients.

However, you can reformulate a mixture model such that it contains only continuous parameters.

The following model runs using ADVI.

@model function gmm(x, K)
    N = length(x)
    μ ~ filldist(Normal(0,1), K)
    w ~ Dirichlet(K, 1)
    for i in 1:N
        x[i] ~ MixtureModel(Normal, μ, w)
    end
end

Alternatively, you could also use the arraydist function, i.e.

@model function gmm(x, K)
    N = length(x)
    μ ~ filldist(Normal(0,1), K)
    w ~ Dirichlet(K, 1)
    x ~ arraydist(map(i -> MixtureModel(Normal, μ, w), 1:N))
end

Note that you might want to change the AD backend if you have a model with a lot of parameters. In this case it is probably not necessary, but for more involved models you might want to keep this in mind and possibly switch to a reverse mode AD. You can find details about this in the documentation of Turing.

mohamed82008 · May 23, 2020, 6:50pm

filldist(MixtureModel(Normal, μ, w), N) is better in this case since the distribution is the same for all elements.

marty0801 · May 23, 2020, 7:25pm

Yes of course! Thank you very much!

marty0801 · May 23, 2020, 9:22pm

One follow on question. If you could please explain why K is required in the function signature. Consider the two nearly-identical models (the only difference being whether K is given as a keyword argument.)

@model gmm(x, K=3) = begin
    N = length(x)
    μ ~ filldist(Normal(0,1), K)
    w ~ Dirichlet(K, 1)
    for ii in 1:N
        x[ii] ~ MixtureModel(Normal, μ, w)
    end
end

@model gmm2(x) = begin
    K = 3
    N = length(x)
    μ = filldist(Normal(0,1), K)
    w ~ Dirichlet(K, 1)
    for ii in 1:N
        x[ii] ~ MixtureModel(Normal, μ, w)
    end
end

The following code executes successfully

model = gmm(Y)
advi = ADVI(10, 1000)
q = vi(model, advi)

Changing the first line above to model=gmm2(Y) gives an error. How can this be when gmm(Y) and gmm2(Y) appear to be functionally equivalent?

trappmartin · May 24, 2020, 6:53am

This should not make any difference. I cannot try it atm, but have you observed the same behaviour on a fresh Julia session?

mohamed82008 · May 24, 2020, 8:30am

This should be μ ~ filldist(Normal(0,1), K).

marty0801 · May 24, 2020, 11:43am

Yep; that was the problem…

rikh · December 16, 2021, 10:06am

For people stumbling into this post at a later moment. I wrote an extensive blog post about exactly this model; partly my post was based on this thread as well: Bayesian Latent Profile Analysis (mixture modeling) - Huijzer.xyz

In the post, I show that mixture models are very tricky to get right due to the label identification problem. To fix this, the post uses an ordered constraint like the ordered prior which is available in Stan.

Topic		Replies	Views
Variational Inference for Gaussian Mixture models Probabilistic Programming turing	4	504	January 26, 2024
Variational Inference for Multinomial output Probabilistic Programming turing	0	379	July 22, 2020
Turing.jl programmatically set number of clusters in a mixture model Probabilistic Programming question , package , turing	6	185	May 10, 2025
Multivariate dirichlet mixture with Turing Probabilistic Programming turing	6	1225	May 16, 2020
Sampling from Turing model combining discrete and continuous variables fails on second loop through the model General Usage question , type , error , turing	0	294	May 29, 2022

Variational Inference of Mixture models?

Related topics