A few things. First, your Stan and Turing models aren’t the same. It looks like you’re using a different prior for the sigmas and the rhos than the Stan model.
Then, as pointed out by @michielver, the loops can be eliminated by using fill_dist. Additionally, the computation of logit_p allocates 4 vectors, but this could be done with just one allocation.
Lastly, you’re passing a higher adapt_delta of 0.95 to the Turing model; higher adapt_deltas cause slower exploration. adapt_deltas that high are really conservative and usually only necessary when diagnosing sampling problems. Turing defaults to 0.65, which may provide faster exploration but at the risk of more divergences, while Stan defaults to 0.8. Probably anything between 0.7 and 0.8 is fine.
I tested a few variants, and the below example, which includes the above suggestions, runs on my old-ish machine in ~15 minutes. This is slower than Stan for this example, but within the same order of magnitude.
using CSV, TuringModels, DataFrames, Turing, LinearAlgebra, Random, ReverseDiff, Memoization
Turing.setadbackend(:reversediff)
Turing.setrdcache(true)
data_path = joinpath(TuringModels.project_root, "data", "chimpanzees.csv");
df = CSV.read(data_path, DataFrame; delim=';');
df.block_id = df.block;
df.treatment = 1 .+ df.prosoc_left .+ 2*df.condition;
quad_form_diag(M, v) = Symmetric((v .* v') .* (M .+ M') ./ 2)
@model function gdemo(actor, block, treatment, pulled_left)
σ_block ~ filldist(Exponential(1), 4)
σ_actor ~ filldist(Exponential(1), 4)
ρ_block ~ LKJ(4, 4)
ρ_actor ~ LKJ(4, 4)
g ~ MvNormal(4, 1)
Σ_block = quad_form_diag(ρ_block, σ_block)
Σ_actor = quad_form_diag(ρ_actor, σ_actor)
β ~ filldist(MvNormal(Σ_block), 6)
α ~ filldist(MvNormal(Σ_actor), 7)
logit_p = broadcast(treatment, actor, block) do t, a, b
return @inbounds g[t] + α[t, a] + β[t, b]
end
pulled_left .~ BinomialLogit.(1, logit_p)
end
rng = MersenneTwister(3702)
@time chns = sample(
rng,
gdemo(df.actor, df.block_id, df.treatment, df.pulled_left),
NUTS(1_000, 0.9),
1_000
);