Complicated GMM in Turing.jl - StackOverFlow error

Michaela_Maskova · May 10, 2021, 10:23am

Hey everyone,
I’m trying to create a MGMM model from paper http://proceedings.mlr.press/v15/xiong11a/xiong11a.pdf. The idea is that I have a dataset D = [G_1, \dots, G_M], where each group consists of vectors G_M = [x_1, \dots, x_n] and I want to use Turing.jl to try and sample from the model. The algorithm for sampling looks like this (from the paper):

I took inspiration from the tutorial on Gaussian Mixture model and I managed to get the model going for one G_m separately. This model works and returns some estimates. They are not very good, but that is expected since I need to do this for all G_m in D.

@model MGMM(Gm) = begin
    
    D, N = size(Gm)
    K, T = 3, 2

    # draw α and χ
    α ~ Dirichlet(T,2)
    χ1 ~ Dirichlet(K,2)
    χ2 ~ Dirichlet(K,2)
    χ = hcat(χ1, χ2)

    # choose topic
    Y ~ Categorical(α)
    θ = χ[:,Y]

    # sample mean values and variance
    μ1 ~ MvNormal(zeros(D),1)
    μ2 ~ MvNormal(zeros(D),1)
    μ3 ~ MvNormal(zeros(D),1)
    μ = [μ1, μ2, μ3]
    σ ~ truncated(Normal(0, 100), 0, Inf)
    
    # draw assignments for each point in x and generate it from a multivariate normal
    for i in 1:N
        k ~ Categorical(θ[:])
        Gm[:,i] ~ MvNormal(μ[k], sqrt(σ))
    end
end

model = MGMM(Gm);
gmm_sampler = Gibbs(PG(100, :k, :Y), HMC(0.05, 10, :μ1, :μ2, :μ3, :σ, :α, :χ1, :χ2))
chain = sample(model, gmm_sampler, 10)

The problem arrises when trying to use this for my data, which are basically data = [G1, G2, ..., GM] where every G_m is a vector of size 2 \times n, n \sim Poisson(100). I rewrote the model like this:


@model MGMM2(data) = begin
    
    K, T = 3, 2

    # draw α and χ
    α ~ Dirichlet(T,2)
    χ1 ~ Dirichlet(K,2)
    χ2 ~ Dirichlet(K,2)
    χ = hcat(χ1, χ2)
    
    for Gm in data
        D, N = size(Gm)

        # choose topic
        Y ~ Categorical(α)
        θ = χ[:,Y]

        # sample mean values and variance
        μ1 ~ MvNormal(zeros(D),1)
        μ2 ~ MvNormal(zeros(D),1)
        μ3 ~ MvNormal(zeros(D),1)
        μ = [μ1, μ2, μ3]
        σ ~ truncated(Normal(0, 10), 0, Inf)
        
        # draw assignments for each point in x and generate it from a multivariate normal
        for i in 1:N
            k ~ Categorical(θ[:])
            Gm[:,i] ~ MvNormal(μ[k], sqrt(σ))
        end
    end
end

model2 = MGMM2(data);
gmm_sampler = Gibbs(PG(100, :k, :Y), HMC(0.05, 10, :μ1, :μ2, :μ3, :σ, :α, :χ1, :χ2))
chain2 = sample(model2, gmm_sampler, 2)

Unfortunatelly, I get StackOverFlow error (showing only the first 2 lines):

ERROR: StackOverflowError:
Stacktrace:
 [1] MvNormal(::Array{Real,1}, ::PDMats.ScalMat{Float64}) at C:\Users\masen\.julia\packages\Distributions\HjzA0\src\multivariate\mvnormal.jl:202 (repeats 52089 times)
 [2] MvNormal(::Array{Real,1}, ::Float64) at C:\Users\masen\.julia\packages\Distributions\HjzA0\src\multivariate\mvnormal.jl:220
etc...

I’m not sure where exactly the error comes from and how to possibly try and solve it. Do you have any ideas? I’m a Turing.jl beginner and I’m not even entirely sure that the code is correct.

torfjelde · May 10, 2021, 11:13am

Unfortunately Turing.jl can’t handle cases where the same variable is sampled multiple times properly, e.g.

# BIG NO-NO!
for i = 1:n
    x ~ Normal()
end

# BIG YES-YES!
for i = 1:n
    x[i] ~ Normal()
end

This also means that you need to pre-allocate a container for x.

Your particular example therefore needs to be changed to:

@model function MGMM2(data, ::Type{TV} = Vector{Float64}) where {TV}
    
    K, T = 3, 2

    # draw α and χ
    α ~ Dirichlet(T,2)
    χ1 ~ Dirichlet(K,2)
    χ2 ~ Dirichlet(K,2)
    χ = hcat(χ1, χ2)

    # Pre-allocate.
    μ1 = [TV(undef, size(Gm, 1)) for Gm in data]
    μ2 = [TV(undef, size(Gm, 1)) for Gm in data]
    μ3 = [TV(undef, size(Gm, 1)) for Gm in data]
    σ = TV(undef, length(data))

    for (data_idx, Gm) in enumerate(data)
        D, N = size(Gm)

        # choose topic
        Y ~ Categorical(α)
        θ = χ[:,Y]

        # sample mean values and variance
        μ1[data_idx] ~ MvNormal(zeros(D),1)
        μ2[data_idx] ~ MvNormal(zeros(D),1)
        μ3[data_idx] ~ MvNormal(zeros(D),1)
        μ = [μ1, μ2, μ3]
        σ[data_idx] ~ truncated(Normal(0, 10), 0, Inf)
        σ = σ[data_idx]
        
        # draw assignments for each point in x and generate it from a multivariate normal
        for i in 1:N
            k ~ Categorical(θ[:])
            Gm[:,i] ~ MvNormal(μ[k], sqrt(σ))
        end
    end
end

A couple of a notes here:

The weird TV argument I’m passing to the model is to tell Turing.jl that “hey, I’m want to use a Vector{Float64} by default, but if you want to use something else, go ahead!” This ensures that the model is type-stable, which significantly improves performance. There’s some more information about it here: Performance Tips The resulting TV(undef, ...) call is then just replaced with Vector{Float64}(undef, ...) or w/e.
- When using particle samplers in Turing.jl, you need to use the TArray type for the variables you want to sample. I’m preeetty sure that when we use this approach of providing the type of the container (the TV argument), Turing.jl will automatically replace this with the correct TArray. But I might be wrong, so if you get weird results still, it might just be fixed replacing ::Type{TV} = Vector{Float64} with ::Type{TV} = TArray{Float64, 1}.
I replaced the begin with function just because it reads a bit nicer when using the where {TV} statement.

Hope this helps!

EDIT: I completely missed that you were doing PG and HMC within-Gibbs! Then you need one type-argument for those used with HMC (requires arrays to be able to take ForwardDiff.Dual since we’re taking the gradient through it) and one for those used with PG (requires TArray as mentioned above). That is:

@model function MGMM2(data, ::Type{TVC} = Array{Float64}, ::Type{TVD} = TArray{Float64}) where {TVC, TVD}
    
    K, T = 3, 2

    # draw α and χ
    α ~ Dirichlet(T,2)
    χ1 ~ Dirichlet(K,2)
    χ2 ~ Dirichlet(K,2)
    χ = hcat(χ1, χ2)

    # Pre-allocate.
    # Continuous variables.
    μ1 = [TVC(undef, size(Gm, 1)) for Gm in data]
    μ2 = [TVC(undef, size(Gm, 1)) for Gm in data]
    μ3 = [TVC(undef, size(Gm, 1)) for Gm in data]
    σ = TVC(undef, length(data))

    # Discrete variables.
    Y = TVD(undef, length(data))
    k = [TVD(undef, size(Gm, 2)) for Gm in data]


    for (data_idx, Gm) in enumerate(data)
        D, N = size(Gm)

        # choose topic
        Y[data_idx] ~ Categorical(α)
        θ = χ[:,Y]

        # sample mean values and variance
        μ1[data_idx] ~ MvNormal(zeros(D),1)
        μ2[data_idx] ~ MvNormal(zeros(D),1)
        μ3[data_idx] ~ MvNormal(zeros(D),1)
        μ = [μ1, μ2, μ3]
        σ[data_idx] ~ truncated(Normal(0, 10), 0, Inf)
        
        # draw assignments for each point in x and generate it from a multivariate normal
        for i in 1:N
            k[data_idx][i] ~ Categorical(θ[:])
            Gm[:,i] ~ MvNormal(μ[k[data_idx][i]], sqrt(σ[data_idx]))
        end
    end
end

Oh and just in case it’s not clear: the choice of names TV, TVC, TVD, etc. doesn’t matter. As long as something is a Type{...} Turing.jl will replace the type used when it deems necessary.

Michaela_Maskova · May 11, 2021, 12:08pm

Thanks for that thorough reply! I tried the new model function and with a few tweaks here and there is does work and returns the chain with parameters.

When trying to understand the results I got, I realized I’ve made a mistake with the parameters \mu, \sigma. Since they are both global the same way as \alpha, \chi parameters are, the sampling needs to take place before the for loop and there’s no need to preallocate the vectors (I hope).

One follow up question though, is there an easy way to collect the parameters of interest, for example all \chi parameters? Right now, the chain returns a single variable for each point in the \chi vector. Therefore, if I had for example 5 vectors \chi_i, \; i \in 1,\dots, 5, of dimension d = 9, I’d get 5 \times 9 \times number of iterations parameters in total. I managed to extract them in a readable form with

ids = findall(map(name -> occursin("χ", String(name)), names(chain)));
df = chain[:, ids, :] |> DataFrame

but it seems a bit strange to extract the parameters that way…

cpfiffer · May 11, 2021, 3:26pm

Yeah, you should be able to do

df = group(chain, "χ") |> DataFrame

torfjelde · May 11, 2021, 3:45pm

Correct! I realized this after replying but figured I’d leave it as it for demonstration purposes, so good spot!

Topic		Replies	Views
Strange issues fitting GMMs in Turing.jl by extending a fairly simple example Probabilistic Programming turing , bayesian-inference	10	1900	February 15, 2021
Turing: help with slow model Probabilistic Programming	46	2638	February 13, 2021
Turing.jl programmatically set number of clusters in a mixture model Probabilistic Programming question , package , turing	6	185	May 10, 2025
Sampling from Turing model combining discrete and continuous variables fails on second loop through the model General Usage question , type , error , turing	0	293	May 29, 2022
Slow Turing.jl sampling compared to python pymc Probabilistic Programming turing	7	128	June 26, 2025

Complicated GMM in Turing.jl - StackOverFlow error

Related topics