Hi everyone,
I am trying to fit an item response model in Turing.jl and have some questions regarding performance. This question is related to Making Turing Fast with large numbers of parameters? and in fact is referenced in post #116.
For simplicity I constructed a minimal model that features just two parameter vectors.

theta
: personspecific parameter 
beta
: itemspecific parameter
where there is 1 parameter for each person and item respectively. In terms of hierarchical models this can be framed as random intercepts for each person and item.
A naive implementation of this model in Turing looks like
@model function irt_naive(y, i, p; I=maximum(i), P=maximum(p))
theta ~ filldist(Normal(), P)
beta ~ filldist(Normal(), I)
for n in eachindex(y)
y[n] ~ Bernoulli(logistic(theta[p[n]]  beta[i[n]]))
end
end
where y
is the response vector, i
is a vector of item indices, and p
is a vector of person indices.
Iβve tried a lot of things to improve the performance of this model (including vectorizing, LazyArrays, β¦) and the most performant version I came up with so far is a version where the likelihood is added manually using the @addlogprob!
macro.
@model function irt(y, i, p; I=maximum(i), P=maximum(p))
theta ~ filldist(Normal(), P)
beta ~ filldist(Normal(), I)
@addlogprob! sum(logpdf.(BernoulliLogit.(theta[p]  beta[i]), y))
end
This change improved performance by a factor of ~5.
Now, comparing the optimized irt
model to Stan reveals that Turing is about 310 times slower.
Here are the timings for increasing number of persons P
and a fixed set of items I = 20
(running on Macbook Pro M1 and Julia 1.8). Note that I tried to match the algorithm in Turing to the Stan defaults, NUTS(1_000, 0.8; max_depth=10)
.
P = 10
Turing: 575.398 ms
Stan: 169.081 ms
ratio: 3.40
P = 100
Turing: 14.295 s
Stan: 1.462 s
ratio: 9.78
P = 1000
Turing: 150.454 s
Stan: 20.029 s
ratio: 7.51
P = 10000
Turing: 2293.964 s
Stan: 405.192 s
ratio: 5.66
To conclude, my specific questions are:
 Is it possible to further improve the code for the
irt
model?  For these types of models with large number of parameters is this performance compared to Stan expected or should Turing be as fast as Stan here?
You can download the full benchmark code from this gist: benchmarking turing vs stan on a simple IRT model Β· GitHub.