Vectorising Turing models

I’m testing out vectorization in Turing models. I’ve managed to get a model that too 195 seconds, down to 60 seconds. But I wonder if there is anything obvious I’m missing in order to get even further benefits?

``````using Turing, StatsPlots
n_chains = 8

# Control
num_errors = [15,10, 61,11, 60, 44, 63, 70, 57,11, 67, 21, 89,12, 63, 11, 96,10, 37,19, 44,18, 78, 27, 60,14]
nᶜ = [89,74,128,87,128,121,128,128,128,78,128,106,128,83,128,100,128,73,128,86,128,86,128,100,128,79]
kᶜ = nᶜ - num_errors
num_errors = [88, 50, 58,17, 40, 18,21, 50, 21, 69,19, 29,11, 76, 46, 36, 37, 72,27, 92,13, 39, 53, 31, 49, 57,17,10,12,21, 39, 43, 49,17, 39,13, 68, 24, 21,27, 48, 54, 41, 75, 38, 76,21, 41, 61,24, 28,21]
nᵃ = [128,128,128,86,128,117,89,128,110,128,93,107,87,128,128,113,128,128,98,128,93,116,128,116,128,128,93,86,86,96,128,128,128,86,128,78,128,111,100,95,128,128,128,128,128,128,98,127,128,93,110,96]
kᵃ = nᵃ - num_errors

Φ(x) = cdf(Normal(0, 1), x)
``````

Original model with loops

``````@model function model1(kᶜ, kᵃ, nᶜ, nᵃ)
δ ~ Normal(0, 1)
μ ~ Normal(0, 1)
σ ~ Uniform(0, 10)
α = δ * σ

ϕᶜ = Vector(undef, length(kᶜ))
θᶜ = Vector(undef, length(kᶜ))

ϕᵃ = Vector(undef, length(kᵃ))
θᵃ = Vector(undef, length(kᵃ))

for i in eachindex(kᶜ)
ϕᶜ[i] ~ Normal(μ + (α / 2), σ)
θᶜ[i] = Φ(ϕᶜ[i])
kᶜ[i] ~ Binomial(nᶜ[i], θᶜ[i])
end

for j in eachindex(kᵃ)
ϕᵃ[j] ~ Normal(μ - (α / 2), σ)
θᵃ[j] = Φ(ϕᵃ[j])
kᵃ[j] ~ Binomial(nᵃ[j], θᵃ[j])
end
end
``````

Vectorised model

``````@model function model2(kᶜ, kᵃ, nᶜ, nᵃ)
δ ~ Normal(0, 1)
μ ~ Normal(0, 1)
σ ~ Uniform(0, 10)
α = δ * σ

# loop over elements of kᶜ
ϕᶜ ~ filldist(Normal(μ + (α / 2), σ), length(kᶜ))
θᶜ = Φ.(ϕᶜ)
@. kᶜ ~ Binomial(nᶜ, θᶜ)

# loop over elements of kᵃ
ϕᵃ ~ filldist(Normal(μ - (α / 2), σ), length(kᵃ))
θᵃ = Φ.(ϕᵃ)
@. kᵃ ~ Binomial(nᵃ, θᵃ)
end
``````

Timing

Timing on an 8 core iMac, with 5000 samples per chain

Original model

``````@time posterior_chain = sample(model1(kᶜ, kᵃ, nᶜ, nᵃ), HMC(0.01, 10), MCMCThreads(), 5_000, n_chains);
``````

Vectorized model

``````@time posterior_chain = sample(model2(kᶜ, kᵃ, nᶜ, nᵃ), HMC(0.01, 10), MCMCThreads(), 5_000, n_chains);
``````

Questions

• Does that vectorized model look about right, or are there obvious rooms for improvement?

Is this the correct set of heuristics?

• use `filldist` for priors
• use `f.(x)` notation for deterministic variables
• use `@.` notation for likelihoods

And just by way of commentary… after initial reluctance with vectorisation (I felt loops were more explicit) I’m convinced that the model specification is more concise and nicer.

It appears as if your arrays are of eltype Any in the first example. I believe the turing docs has a section on how to initialize your arrays with the correct type parameter for better performance.

1 Like