Advice speeding up a turing model

jacques1 · January 13, 2025, 8:49pm

I have a model in Turing which is exceptionally slow and am hoping for some help speeding it up. I am new to Turing so any pointers are helpful.

My model is

@model function many_linreg(X, y; predictors=size(X, 2))
    #priors
    α ~ Normal(0, sqrt(250))
    β ~ filldist(Uniform(0, 1e8), predictors)
    σ ~ Uniform(0, 1e4)
    N = length(y)
    #likelihood
    for i in 1:N
        y[i] ~ MvNormal(α .+ X * β, σ^2 * I)
    end
end

I sample it with

chain2 = sample(model2, NUTS(), MCMCThreads(), 1_000, 20, adtype=AutoReverseDiff(true))

I’d like to run this eventually on ~30k observation vectors, but I can’t make it run in a reasonable time on even 5k. I just get stuck on 0% completion of sampling after finding initial step sizes.

I have tried the common tricks I know to speed things up. For instance I am using a QR decomposed matrix of predictors, and backwards differentiation with tape. I tried to vectorize the for loop but get errors when trying to call filldist on it. I think maybe I should be using arraydist on single normals but don’t know?

bertschi · January 13, 2025, 8:58pm

Do you really want that each observation y[i] is a multi-variate normal?
I.e., a linear regression model would have

y .~ Normal.(α .+ X * β, σ)
# or
y ~ MvNormal(α .+ X * β, σ^2 * I)

jacques1 · January 13, 2025, 9:05pm

Each y[i] is a 64 element vector each entry of which is subject to iid homoskedastic normal noise. I have a list of 30k such y[i] bundled into the vector y. So shouldn’t each y[i] be multivariate normal then?

bertschi · January 13, 2025, 10:19pm

Ah, ok that was not clear to me. So this means that each of the samples y[i] is from the same distribution?
If that’s the case, you could aggregate the samples and compute the likelihood based on the sufficient statistics. Further, those wide priors can hinder sampling, i.e., try to use something more informed like a reasonably constrained normal or a wider tail if less information is available. Here is an interesting post on the role/influence of priors.

jacques1 · January 14, 2025, 12:02am

Unfortunately I have very very weak priors, which I feel can’t be improved from what I gave. The y values may be from one distribution or from a mixture of several distributions. It is again hard to know. I’d like to accelerate the model as written, just using any optimizations available in Julia/turing.

gdalle · January 14, 2025, 7:23am

Maybe you could group all the y[i]s into a single matrix-variate distribution with a block-diagonal covariance?

sethaxen · January 14, 2025, 9:40am

Specifically, you could implement @gdalle’s suggestion like this:

using PDMats

...
# outside the model
M = length(first(y))
Y = reduce(hcat, y)
# inside the model
Y ~ MatrixNormal(repeat(α .+ X * β, 1, N), ScalMat(M, σ^2), ScalMat(N, 1))

I’d be surprised if this massively improved sampling though. Uniform priors or correlated predictors often cause problems here, and you probably have more information about the priors than you think you have. e.g. this prior says that it’s equally likely for σ to be 0, 1, or 10^4, but 10^4 + 1e-12 is absolutely impossible. Is that what you actually believe? Often you have some range outside of which you would be surprised. At the very least, your surprise at seeing a high value probably smoothly increases as you approach 10^4 instead of suddenly kicking in at 10^4. So a heavy-tailed truncated prior like truncated(TDist(2)*1e3; lower=0) could make more sense and improve sampling.

When you have genuinely uninformative priors, and some of your predictors might be correlated, the QR reparameterization (see e.g. the Stan docs) often helps a lot. Here, it would look like:

using LinearAlgebra

function _qr(x)
    Q, R = qr(x)
    return Q, UpperTriangular(R)
end

@model function many_linreg_qr(X, y; predictors=size(X, 2), M=size(X, 1), Q=sqrt(M-1)*Matrix(qr(X)), R=UpperTriangular(qr(X).R)/sqrt(M-1))
    #priors
    α ~ Normal(0, sqrt(250))
    θ ~ filldist(Uniform(0, 1e8), predictors)
    β := R \ θ
    σ ~ Uniform(0, 1e4)
    N = length(y)
    #likelihood
    ydist = MvNormal(α .+ (Q * θ), σ^2 * I)
    for i in 1:N
        y[i] ~ ydist
    end
end

Note that placing the prior on \theta instead of \beta results in a completely different implicit prior on \beta (the resulting prior assumes a similar correlation structure as the predictors themselves have, which is what sampling will generally produce anyways), but this is fine here since you don’t know what your priors should be anyways.

gdalle · January 14, 2025, 7:20pm

Essentially there’s two things to consider: how fast your model is, and how many times you have to run it. Classic Julia performance tricks will only improve item 1, and there’s a limit to how fast you can go. Then, item 2 probably depends on how well your model fits your data (which is why unreasonable priors may legit hurt you).

Topic		Replies	Views
Speed up model in Turing Probabilistic programming question , performance , turing	1	503	November 21, 2020
Speed up multilevel model in Turing.jl Probabilistic programming turing	15	485	June 17, 2023
Turing: help with slow model Probabilistic programming	46	2629	February 13, 2021
Avoiding loops in Turing Probabilistic programming turing	7	1247	April 26, 2021
Help optimising Bayesian ODE model with Turing New to Julia turing , bayesian-inference	3	531	April 9, 2022

Advice speeding up a turing model

Related topics