Complicated GMM in Turing.jl - StackOverFlow error

Hey everyone,
I’m trying to create a MGMM model from paper http://proceedings.mlr.press/v15/xiong11a/xiong11a.pdf. The idea is that I have a dataset D = [G_1, \dots, G_M], where each group consists of vectors G_M = [x_1, \dots, x_n] and I want to use Turing.jl to try and sample from the model. The algorithm for sampling looks like this (from the paper):

image

I took inspiration from the tutorial on Gaussian Mixture model and I managed to get the model going for one G_m separately. This model works and returns some estimates. They are not very good, but that is expected since I need to do this for all G_m in D.

@model MGMM(Gm) = begin
    
    D, N = size(Gm)
    K, T = 3, 2

    # draw α and χ
    α ~ Dirichlet(T,2)
    χ1 ~ Dirichlet(K,2)
    χ2 ~ Dirichlet(K,2)
    χ = hcat(χ1, χ2)

    # choose topic
    Y ~ Categorical(α)
    θ = χ[:,Y]

    # sample mean values and variance
    μ1 ~ MvNormal(zeros(D),1)
    μ2 ~ MvNormal(zeros(D),1)
    μ3 ~ MvNormal(zeros(D),1)
    μ = [μ1, μ2, μ3]
    σ ~ truncated(Normal(0, 100), 0, Inf)
    
    # draw assignments for each point in x and generate it from a multivariate normal
    for i in 1:N
        k ~ Categorical(θ[:])
        Gm[:,i] ~ MvNormal(μ[k], sqrt(σ))
    end
end

model = MGMM(Gm);
gmm_sampler = Gibbs(PG(100, :k, :Y), HMC(0.05, 10, :μ1, :μ2, :μ3, :σ, :α, :χ1, :χ2))
chain = sample(model, gmm_sampler, 10)

The problem arrises when trying to use this for my data, which are basically data = [G1, G2, ..., GM] where every G_m is a vector of size 2 \times n, n \sim Poisson(100). I rewrote the model like this:


@model MGMM2(data) = begin
    
    K, T = 3, 2

    # draw α and χ
    α ~ Dirichlet(T,2)
    χ1 ~ Dirichlet(K,2)
    χ2 ~ Dirichlet(K,2)
    χ = hcat(χ1, χ2)
    
    for Gm in data
        D, N = size(Gm)

        # choose topic
        Y ~ Categorical(α)
        θ = χ[:,Y]

        # sample mean values and variance
        μ1 ~ MvNormal(zeros(D),1)
        μ2 ~ MvNormal(zeros(D),1)
        μ3 ~ MvNormal(zeros(D),1)
        μ = [μ1, μ2, μ3]
        σ ~ truncated(Normal(0, 10), 0, Inf)
        
        # draw assignments for each point in x and generate it from a multivariate normal
        for i in 1:N
            k ~ Categorical(θ[:])
            Gm[:,i] ~ MvNormal(μ[k], sqrt(σ))
        end
    end
end

model2 = MGMM2(data);
gmm_sampler = Gibbs(PG(100, :k, :Y), HMC(0.05, 10, :μ1, :μ2, :μ3, :σ, :α, :χ1, :χ2))
chain2 = sample(model2, gmm_sampler, 2)

Unfortunatelly, I get StackOverFlow error (showing only the first 2 lines):

ERROR: StackOverflowError:
Stacktrace:
 [1] MvNormal(::Array{Real,1}, ::PDMats.ScalMat{Float64}) at C:\Users\masen\.julia\packages\Distributions\HjzA0\src\multivariate\mvnormal.jl:202 (repeats 52089 times)
 [2] MvNormal(::Array{Real,1}, ::Float64) at C:\Users\masen\.julia\packages\Distributions\HjzA0\src\multivariate\mvnormal.jl:220
etc...

I’m not sure where exactly the error comes from and how to possibly try and solve it. Do you have any ideas? I’m a Turing.jl beginner and I’m not even entirely sure that the code is correct.

1 Like

Unfortunately Turing.jl can’t handle cases where the same variable is sampled multiple times properly, e.g.

# BIG NO-NO!
for i = 1:n
    x ~ Normal()
end

# BIG YES-YES!
for i = 1:n
    x[i] ~ Normal()
end

This also means that you need to pre-allocate a container for x.

Your particular example therefore needs to be changed to:

@model function MGMM2(data, ::Type{TV} = Vector{Float64}) where {TV}
    
    K, T = 3, 2

    # draw α and χ
    α ~ Dirichlet(T,2)
    χ1 ~ Dirichlet(K,2)
    χ2 ~ Dirichlet(K,2)
    χ = hcat(χ1, χ2)

    # Pre-allocate.
    μ1 = [TV(undef, size(Gm, 1)) for Gm in data]
    μ2 = [TV(undef, size(Gm, 1)) for Gm in data]
    μ3 = [TV(undef, size(Gm, 1)) for Gm in data]
    σ = TV(undef, length(data))

    for (data_idx, Gm) in enumerate(data)
        D, N = size(Gm)

        # choose topic
        Y ~ Categorical(α)
        θ = χ[:,Y]

        # sample mean values and variance
        μ1[data_idx] ~ MvNormal(zeros(D),1)
        μ2[data_idx] ~ MvNormal(zeros(D),1)
        μ3[data_idx] ~ MvNormal(zeros(D),1)
        μ = [μ1, μ2, μ3]
        σ[data_idx] ~ truncated(Normal(0, 10), 0, Inf)
        σ = σ[data_idx]
        
        # draw assignments for each point in x and generate it from a multivariate normal
        for i in 1:N
            k ~ Categorical(θ[:])
            Gm[:,i] ~ MvNormal(μ[k], sqrt(σ))
        end
    end
end

A couple of a notes here:

  1. The weird TV argument I’m passing to the model is to tell Turing.jl that “hey, I’m want to use a Vector{Float64} by default, but if you want to use something else, go ahead!” This ensures that the model is type-stable, which significantly improves performance. There’s some more information about it here: Performance Tips The resulting TV(undef, ...) call is then just replaced with Vector{Float64}(undef, ...) or w/e.
    • When using particle samplers in Turing.jl, you need to use the TArray type for the variables you want to sample. I’m preeetty sure that when we use this approach of providing the type of the container (the TV argument), Turing.jl will automatically replace this with the correct TArray. But I might be wrong, so if you get weird results still, it might just be fixed replacing ::Type{TV} = Vector{Float64} with ::Type{TV} = TArray{Float64, 1}.
  2. I replaced the begin with function just because it reads a bit nicer when using the where {TV} statement.

Hope this helps!:slight_smile:

EDIT: I completely missed that you were doing PG and HMC within-Gibbs! Then you need one type-argument for those used with HMC (requires arrays to be able to take ForwardDiff.Dual since we’re taking the gradient through it) and one for those used with PG (requires TArray as mentioned above). That is:

@model function MGMM2(data, ::Type{TVC} = Array{Float64}, ::Type{TVD} = TArray{Float64}) where {TVC, TVD}
    
    K, T = 3, 2

    # draw α and χ
    α ~ Dirichlet(T,2)
    χ1 ~ Dirichlet(K,2)
    χ2 ~ Dirichlet(K,2)
    χ = hcat(χ1, χ2)

    # Pre-allocate.
    # Continuous variables.
    μ1 = [TVC(undef, size(Gm, 1)) for Gm in data]
    μ2 = [TVC(undef, size(Gm, 1)) for Gm in data]
    μ3 = [TVC(undef, size(Gm, 1)) for Gm in data]
    σ = TVC(undef, length(data))

    # Discrete variables.
    Y = TVD(undef, length(data))
    k = [TVD(undef, size(Gm, 2)) for Gm in data]


    for (data_idx, Gm) in enumerate(data)
        D, N = size(Gm)

        # choose topic
        Y[data_idx] ~ Categorical(α)
        θ = χ[:,Y]

        # sample mean values and variance
        μ1[data_idx] ~ MvNormal(zeros(D),1)
        μ2[data_idx] ~ MvNormal(zeros(D),1)
        μ3[data_idx] ~ MvNormal(zeros(D),1)
        μ = [μ1, μ2, μ3]
        σ[data_idx] ~ truncated(Normal(0, 10), 0, Inf)
        
        # draw assignments for each point in x and generate it from a multivariate normal
        for i in 1:N
            k[data_idx][i] ~ Categorical(θ[:])
            Gm[:,i] ~ MvNormal(μ[k[data_idx][i]], sqrt(σ[data_idx]))
        end
    end
end

Oh and just in case it’s not clear: the choice of names TV, TVC, TVD, etc. doesn’t matter. As long as something is a Type{...} Turing.jl will replace the type used when it deems necessary.

1 Like

Thanks for that thorough reply! :heart: I tried the new model function and with a few tweaks here and there is does work and returns the chain with parameters.

When trying to understand the results I got, I realized I’ve made a mistake with the parameters \mu, \sigma. Since they are both global the same way as \alpha, \chi parameters are, the sampling needs to take place before the for loop and there’s no need to preallocate the vectors (I hope).

One follow up question though, is there an easy way to collect the parameters of interest, for example all \chi parameters? Right now, the chain returns a single variable for each point in the \chi vector. Therefore, if I had for example 5 vectors \chi_i, \; i \in 1,\dots, 5, of dimension d = 9, I’d get 5 \times 9 \times number of iterations parameters in total. I managed to extract them in a readable form with

ids = findall(map(name -> occursin("χ", String(name)), names(chain)));
df = chain[:, ids, :] |> DataFrame

but it seems a bit strange to extract the parameters that way…

Yeah, you should be able to do

df = group(chain, "χ") |> DataFrame
2 Likes

Correct!:slight_smile: I realized this after replying but figured I’d leave it as it for demonstration purposes, so good spot!