Advice for simple (mildly large) Turing model

I have a simple Turing model I’ve been trying to implement, but I’ve been running into problems actually running it. I say simple, but my actual problem is not “small”, parameter-wise. I have about 18 groups, with about 60 items, and each item has 30 response options (treating as categorical). So, a lot of parameters. NUTS won’t run (not surprising); MH will, but crashes with a stackoverflow after the model finishes running.

For testing, I cut down the problem to 18 groups, 1 item, and 30 response options. NUTS will run, but it’s incredibly slow. MH will run very very quickly, but takes a million years to actually process things after running (I guess saving and processing the chains?)

Here’s a MWE – if anyone has pointers on what I’m doing wrong, please let me know!

using Turing, MCMCChains

# Generate some fake data; it's an attempt at a hierarchical multinomial model with 18 groups and 30 response categories
fake_dat = [5*ones(Int64, 30) for j=1:18] #Frequency of responses in each of 30 categories for 18 groups
fake_N = fill(5*30, 18) #Total N

# Model an overall distribution of responses (beta); alpha governs how closely the individual group distributions come to beta
@model multinomial_RE(data, N, ::Type{T}=Vector{Float64}) where {T} = begin
    B = length(data)
    R = length(data[1,1])
    theta = [T(undef, R) for j=1:B]
    beta ~ Dirichlet(R, 1)
    alpha ~ truncated(Normal(1, 100), 0, Inf)

    for b in 1:B
        theta[b] ~ Dirichlet(beta * alpha)
        data[b] ~ Multinomial(N[b], theta[b])
    end    
end

model1 = sample(multinomial_RE(fake_dat, fake_N), NUTS(0.65), 1000)
model2 = sample(multinomial_RE(fake_dat, fake_N), MH(), 50000)

So, the NUTS sampler estimates hours to finish. The metropolis sampler takes seconds for the 50k runs, but then just hangs at 100% on the progress bar. I also sometimes get some apparent numerical instability (random, but infrequent, fails with parameters listed as NaN in output). Checking with @code_warntype doesn’t show any issues that I can see.

Thoughts? I get that my original problem is huge, so I expected problems, but the MWE above should be small enough to work in a decent time, right?

EDIT: Forgot to mention, I am on Julia 1.3.0, Windows.

The slow performance of NUTS is due to the use of forward mode AD. You can read details and how to switch the AD in Turing on turing.ml

Regarding the time after the sampling, this is likely related to the construction of the chain object or the computation of statistics. If you use a notebook, try adding a semicolon after the sample call. This prevents that the show function is called which invokes the computation of statistics for each parameter.

The chain construction is a real bottleneck here, even with the semi-colon. I am working on fixing it.

1 Like

Thanks for the hint about the post-sampling call!

Rearding the auto-diff, I can’t have loops in the current reverse-mode, can I? In Turing’s language, any notes on how to write this part of the inner loop

data[b] ~ Multinomial(N[b], theta[b])

in a vectorized way to get around that?

Thanks! While I’m posting on the issue of sampler output, do you know of a way to write the model output directly to disk? I didn’t see anything about re-directing sampler output anywhere in the docs, but I figured there might be some hidden functionality or change I could make somewhere. In the past, particularly for very large models, I’ve found the ability to have samplers write their output directly to disk (in an on-line fashion) rather useful.

Loops can be used in backward mode but are less efficient than using vectorisation. Therefore, it’s recommended to do so.

You can write a custom distribution that handles the vectorisation. This line might help to figure out how to do it: https://github.com/TuringLang/Turing.jl/blob/5db142b3e225b896f6bf149eb6ef4248cbabfb5e/src/stdlib/distributions.jl#L69

We will also have vectorisation syntax in the next release.

I’ll add some better documentation about custom distributions some time soon. The current documentation is not very good regarding this.

This is taking longer than expected. There is currently no way to save the samples to disk directly. Please open an issue. I will try to fix the time thing first.

Will do, thanks.

Hi,

You can use the branch mt/params_to_array for now until this PR gets merged. The 50000 case now terminates in under 3 minutes on my machine. It’s still very slow and I will try to make it faster, but it’s much faster now than before. Also, see the way to handle numerical problems in the model in my PR using @logpdf() = -Inf and return.

Also you may want to try using fewer iterations, saving to disk, and then resuming the sampling using resume(chain, n_iter). You can also try Julia’s multi-threading to sample multiple chains simultaneously.