Vectorising Turing models

I’m testing out vectorization in Turing models. I’ve managed to get a model that too 195 seconds, down to 60 seconds. But I wonder if there is anything obvious I’m missing in order to get even further benefits?

using Turing, StatsPlots
n_chains = 8

# Control
num_errors = [15,10, 61,11, 60, 44, 63, 70, 57,11, 67, 21, 89,12, 63, 11, 96,10, 37,19, 44,18, 78, 27, 60,14]
nᶜ = [89,74,128,87,128,121,128,128,128,78,128,106,128,83,128,100,128,73,128,86,128,86,128,100,128,79]
kᶜ = nᶜ - num_errors
# ADHD
num_errors = [88, 50, 58,17, 40, 18,21, 50, 21, 69,19, 29,11, 76, 46, 36, 37, 72,27, 92,13, 39, 53, 31, 49, 57,17,10,12,21, 39, 43, 49,17, 39,13, 68, 24, 21,27, 48, 54, 41, 75, 38, 76,21, 41, 61,24, 28,21]
nᵃ = [128,128,128,86,128,117,89,128,110,128,93,107,87,128,128,113,128,128,98,128,93,116,128,116,128,128,93,86,86,96,128,128,128,86,128,78,128,111,100,95,128,128,128,128,128,128,98,127,128,93,110,96]
kᵃ = nᵃ - num_errors

Φ(x) = cdf(Normal(0, 1), x)

Original model with loops

@model function model1(kᶜ, kᵃ, nᶜ, nᵃ)
    δ ~ Normal(0, 1)
    μ ~ Normal(0, 1)
    σ ~ Uniform(0, 10)
    α = δ * σ

    ϕᶜ = Vector(undef, length(kᶜ))
    θᶜ = Vector(undef, length(kᶜ))

    ϕᵃ = Vector(undef, length(kᵃ))
    θᵃ = Vector(undef, length(kᵃ))

    for i in eachindex(kᶜ)
        ϕᶜ[i] ~ Normal(μ + (α / 2), σ)
        θᶜ[i] = Φ(ϕᶜ[i])
        kᶜ[i] ~ Binomial(nᶜ[i], θᶜ[i])
    end

    for j in eachindex(kᵃ)
        ϕᵃ[j] ~ Normal(μ - (α / 2), σ)
        θᵃ[j] = Φ(ϕᵃ[j])
        kᵃ[j] ~ Binomial(nᵃ[j], θᵃ[j])
    end
end

Vectorised model

@model function model2(kᶜ, kᵃ, nᶜ, nᵃ)
    δ ~ Normal(0, 1)
    μ ~ Normal(0, 1)
    σ ~ Uniform(0, 10)
    α = δ * σ

    # loop over elements of kᶜ
    ϕᶜ ~ filldist(Normal(μ + (α / 2), σ), length(kᶜ))
    θᶜ = Φ.(ϕᶜ)
    @. kᶜ ~ Binomial(nᶜ, θᶜ)

    # loop over elements of kᵃ
    ϕᵃ ~ filldist(Normal(μ - (α / 2), σ), length(kᵃ))
    θᵃ = Φ.(ϕᵃ)
    @. kᵃ ~ Binomial(nᵃ, θᵃ)
end

Timing

Timing on an 8 core iMac, with 5000 samples per chain

Original model

@time posterior_chain = sample(model1(kᶜ, kᵃ, nᶜ, nᵃ), HMC(0.01, 10), MCMCThreads(), 5_000, n_chains);

Takes about 195 seconds

Vectorized model

@time posterior_chain = sample(model2(kᶜ, kᵃ, nᶜ, nᵃ), HMC(0.01, 10), MCMCThreads(), 5_000, n_chains);

Takes about 60 seconds

Questions

  • Does that vectorized model look about right, or are there obvious rooms for improvement?

Is this the correct set of heuristics?

  • use filldist for priors
  • use f.(x) notation for deterministic variables
  • use @. notation for likelihoods

And just by way of commentary… after initial reluctance with vectorisation (I felt loops were more explicit) I’m convinced that the model specification is more concise and nicer.

It appears as if your arrays are of eltype Any in the first example. I believe the turing docs has a section on how to initialize your arrays with the correct type parameter for better performance.

1 Like