Multinomial(::Int, ::Vector{Any}) error in Turing doing LDA

Hello,

I’m trying to get my hands on Turing by implementing a Latent Dirichlet Allocation model from a youtube course I’m following. I implemented the following model:

using Turing, Distributions

@model function LDA(wordcounts::Matrix{Int})
    D, V = size(wordcounts) #100,25
    K = 10
    W = sum(wordcounts[1,:]) #words per doc (1000 with real data)

    π = Matrix{Float64}(undef, D, K)
    α = 0.3
    for d in 1:D
        π[d,:] ~ Dirichlet(K, α)
    end
    
    θ = Matrix{Float64}(undef, K, V)
    β = 0.3
    for k in 1:K
        θ[k,:] ~ Dirichlet(V, β)
    end

    for d in 1:D
        p_w = θ'*π[d,:]
        wordcounts[d,:] ~ Multinomial(W, p_w)
    end
    return θ
end

The observed variable is a matrix that contains one row per document observed and each row counts the number of occurence of a word in each document. The problem arises with the multinomial variable. When I sample using a random input matrix I get this

julia> chain = sample(LDA(rand(1:4, 100, 25)), HMC(0.05,10), 200)
ERROR: MethodError: no method matching Multinomial(::Int64, ::Vector{Any})
Closest candidates are:
  Multinomial(::Integer, ::Integer) at C:\Users\Henri\.julia\packages\Distributions\1313k\src\multivariate\multinomial.jl:40
  Multinomial(::Integer, ::TV; check_args) where {T<:Real, TV<:AbstractVector{T}} at C:\Users\Henri\.julia\packages\Distributions\1313k\src\multivariate\multinomial.jl:28
Stacktrace:
  [1] LDA(__model__::DynamicPPL.Model{typeof(LDA), (:wordcounts,), (), (), Tuple{Matrix{Int64}}, Tuple{}, DynamicPPL.DefaultContext}, __varinfo__::DynamicPPL.ThreadSafeVarInfo{DynamicPPL.UntypedVarInfo{DynamicPPL.Metadata{Dict{AbstractPPL.VarName, Int64}, Vector{Distribution}, Vector{AbstractPPL.VarName}, Vector{Real}, Vector{Set{DynamicPPL.Selector}}}, Float64}, Vector{Base.RefValue{Float64}}}, __context__::DynamicPPL.SamplingContext{DynamicPPL.SampleFromUniform, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}, wordcounts::Matrix{Int64})
    @ Main c:\Users\Henri\OneDrive - UCL\Doctorat\Cours\Tubinggen Probabilistic ML\Latent Dirichlet\LDAProbML\src\LDAProbML.jl:22
  [2] macro expansion
    @ C:\Users\Henri\.julia\packages\DynamicPPL\c8MjC\src\model.jl:489 [inlined]
  [3] _evaluate!!
    @ C:\Users\Henri\.julia\packages\DynamicPPL\c8MjC\src\model.jl:472 [inlined]
  [4] evaluate_threadsafe!!(model::DynamicPPL.Model{typeof(LDA), (:wordcounts,), (), (), Tuple{Matrix{Int64}}, Tuple{}, DynamicPPL.DefaultContext}, varinfo::DynamicPPL.UntypedVarInfo{DynamicPPL.Metadata{Dict{AbstractPPL.VarName, Int64}, Vector{Distribution}, Vector{AbstractPPL.VarName}, Vector{Real}, Vector{Set{DynamicPPL.Selector}}}, Float64}, context::DynamicPPL.SamplingContext{DynamicPPL.SampleFromUniform, DynamicPPL.DefaultContext, Random._GLOBAL_RNG})
    @ DynamicPPL C:\Users\Henri\.julia\packages\DynamicPPL\c8MjC\src\model.jl:463
  [5] evaluate!!
    @ C:\Users\Henri\.julia\packages\DynamicPPL\c8MjC\src\model.jl:402 [inlined]
  [6] evaluate!!
    @ C:\Users\Henri\.julia\packages\DynamicPPL\c8MjC\src\model.jl:415 [inlined]
  [7] Model
    @ C:\Users\Henri\.julia\packages\DynamicPPL\c8MjC\src\model.jl:377 [inlined]
  [8] VarInfo
    @ C:\Users\Henri\.julia\packages\DynamicPPL\c8MjC\src\varinfo.jl:127 [inlined]
  [9] VarInfo
    @ C:\Users\Henri\.julia\packages\DynamicPPL\c8MjC\src\varinfo.jl:126 [inlined]
 [10] step(rng::Random._GLOBAL_RNG, model::DynamicPPL.Model{typeof(LDA), (:wordcounts,), (), (), Tuple{Matrix{Int64}}, Tuple{}, DynamicPPL.DefaultContext}, spl::DynamicPPL.Sampler{HMC{Turing.Core.ForwardDiffAD{40}, (), AdvancedHMC.UnitEuclideanMetric}}; resume_from::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ DynamicPPL C:\Users\Henri\.julia\packages\DynamicPPL\c8MjC\src\sampler.jl:81
 [11] step
    @ C:\Users\Henri\.julia\packages\DynamicPPL\c8MjC\src\sampler.jl:74 [inlined]
 [12] macro expansion
    @ C:\Users\Henri\.julia\packages\AbstractMCMC\BPJCW\src\sample.jl:123 [inlined]
 [13] macro expansion
    @ C:\Users\Henri\.julia\packages\ProgressLogging\6KXlp\src\ProgressLogging.jl:328 [inlined]
 [14] macro expansion
    @ C:\Users\Henri\.julia\packages\AbstractMCMC\BPJCW\src\logging.jl:8 [inlined]
 [15] mcmcsample(rng::Random._GLOBAL_RNG, model::DynamicPPL.Model{typeof(LDA), (:wordcounts,), (), (), Tuple{Matrix{Int64}}, Tuple{}, DynamicPPL.DefaultContext}, sampler::DynamicPPL.Sampler{HMC{Turing.Core.ForwardDiffAD{40}, (), AdvancedHMC.UnitEuclideanMetric}}, N::Int64; progress::Bool, progressname::String, callback::Nothing, discard_initial::Int64, thinning::Int64, chain_type::Type, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ AbstractMCMC C:\Users\Henri\.julia\packages\AbstractMCMC\BPJCW\src\sample.jl:114
 [16] sample(rng::Random._GLOBAL_RNG, model::DynamicPPL.Model{typeof(LDA), (:wordcounts,), (), (), Tuple{Matrix{Int64}}, Tuple{}, DynamicPPL.DefaultContext}, sampler::DynamicPPL.Sampler{HMC{Turing.Core.ForwardDiffAD{40}, (), AdvancedHMC.UnitEuclideanMetric}}, N::Int64; chain_type::Type, resume_from::Nothing, progress::Bool, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Turing.Inference C:\Users\Henri\.julia\packages\Turing\Ir2iS\src\inference\Inference.jl:159
 [17] sample
    @ C:\Users\Henri\.julia\packages\Turing\Ir2iS\src\inference\Inference.jl:158 [inlined]
 [18] #sample#2
    @ C:\Users\Henri\.julia\packages\Turing\Ir2iS\src\inference\Inference.jl:145 [inlined]
 [19] sample
    @ C:\Users\Henri\.julia\packages\Turing\Ir2iS\src\inference\Inference.jl:145 [inlined]
 [20] #sample#1
    @ C:\Users\Henri\.julia\packages\Turing\Ir2iS\src\inference\Inference.jl:135 [inlined]
 [21] sample(model::DynamicPPL.Model{typeof(LDA), (:wordcounts,), (), (), Tuple{Matrix{Int64}}, Tuple{}, DynamicPPL.DefaultContext}, alg::HMC{Turing.Core.ForwardDiffAD{40}, (), AdvancedHMC.UnitEuclideanMetric}, N::Int64)
    @ Turing.Inference C:\Users\Henri\.julia\packages\Turing\Ir2iS\src\inference\Inference.jl:135
 [22] top-level scope
    @ REPL[48]:1

The model considers that my p_w vector’s eltype is Any. But it’s not, the pi and theta matrices are of Float64 numbers and so is p_w when I try in my REPL. I don’t get it, why this error ?

I don’t know why you’re getting eltypes of Any, but I’m surprised that’s the first error you saw. Turing is computing gradients of your log-density using forward-mode automatic differentiation with ForwardDiff. This uses a custom numeric type, so you can’t constrain the types of your containers to Float64. Instead of allocating the containers, let Turing do it for you. Try

π ~ filldist(Dirichlet(K, α), D) # size (K, D)
θ ~ filldist(Dirichlet(V, β), K) # size (V, K)
P_w = θ*π  # size(V, D)
for d in 1:D
    wordcounts[d,:] ~ Multinomial(W, P_w[:, d])
end

Alright thank you. I did not know about this filldist, I just copied from the tutorials, and they use undef matrices too. (e.g. Unsupervised Learning using Bayesian Mixture Models)

The LDA does not work still because of a DomainError (the log of a negative value) but I think it might be because I don’t model the LDA correctly in the first place.

If I manage to make it work it could be a nice new tutorial to document.

EDIT: with PG sampler it works so that might just be because of a poor choice of sampler.
EDIT2: Well no it does not work either but there’s not error.

Ints are discrete, so that model is not differentiable anyways, and they couldn’t sample it with HMC.

This is often how it goes. HMC (even better, NUTS), is just a better sampler, and its errors are often useful diagnostics of model problems with the model, where other samplers might silently fail.

So perhaps the issue is that, as you say, the LDA is not modeled correctly here?

It’s most likely a modeling problem on my end yes, I just learned about LDA and Turing yesterday so it’s not surprising that I’ll need a few attempts.
LDA is fundamentally about categorical latent variables though, so I guess HMC and NUTS won’t ever be possible.
In the course, the professor says he used Gibbs Sampling. In Turing, GS is a way to use multiple samplers for different variables. I think in the course, GS refers to using MH for one variable given all the other variables.

In many cases, you can marginalize out the discrete parameters to get a continuous model amenable to sampling with NUTS (see 9.5 Latent Dirichlet allocation | Stan User’s Guide for LDA) and then recover exact discrete draws in post-processing (see [Turing] Using a random variable as an index - #2 by sethaxen for a Turing example). This often yields better inferences than sampling discrete parameters directly, but it takes some more work up-front, which is why Turing’s support of Gibbs samplers is useful. It’s still good to know though that easy-to-use samplers often are easy because they don’t error when they have problems, not because they don’t have them.

These two are equivalent. In Turing you specify which parameters will use which sampler. When that sampler is used, all other parameters are held fixed, i.e. the joint model is conditioned on them.

2 Likes