What is the idiomatic way to write model in Turing. For example in Turing documentation for binary coinflip:
using Turing
using Distributions
using Random
p_true = 0.5
n = 1000
Random.seed!(12)
data = rand(Bernoulli(p_true),n)
# From Turing example in the documentation
@model function coinflip1(y)
p ~ Beta(1.,1.)
for i in 1:length(y)
y[i] ~ Bernoulli(p)
end
end
# I though the t following coinflip2 code is not working because no documentation, but its work
@model function coinflip2(y)
p ~ Beta(1.,1.)
y ~ Bernoulli(p)
end
# This model I read from paper "DynamicPPL: Stan-like Speed for Dynamic Probabilistic Models"
@model function coinflip3(y)
p ~ Beta(1.,1.)
y .~ Bernoulli(p)
end
The coinflip2 and coinflip3 is faster than coinflip1 even have not written in doc. Using 2000 iterations and NUTS(0.65), here is simple benchmark using macro @time:
I suspect that the for loop in coinflip1 performs a separate evaluation on each iteration with overhead. Perhaps there is someway to improve performance on the Turing side. As a workaround, you can create a new distribution type and place the for loop inside the logpdf function. I use this in cases where (1) I am defining a custom distribution, (2) I am using a non-standard data type, (3) a for loop is a more natural way to define the model. Here is an example for comparison:
using Turing
using Distributions
using Random
import Distributions: logpdf, loglikelihood
struct MyDist{T} <: ContinuousUnivariateDistribution
θ::T
end
function logpdf(d::MyDist, data::Array{Bool,1})
LL = 0.0
for i in 1:length(data)
LL += logpdf(Bernoulli(d.θ), data[i])
end
return LL
end
loglikelihood(d::MyDist, data::Array{Bool,1}) = logpdf(d, data)
p_true = 0.5
n = 1000
Random.seed!(12)
data = rand(Bernoulli(p_true), n)
@model function coinflip4(y)
p ~ Beta(1.,1.)
y ~ MyDist(p)
end
n_samples = 2000
delta = .65
n_adapt = 1000
config = NUTS(n_adapt, delta)
@time chain = sample(coinflip4(data), config, n_samples)
The run time for coinflip4 is nearly identical to coinflip3.
0.205202 seconds (1.19 M allocations: 74.529 MiB, 5.45% gc time)
Actually when I first using Turing, my complain is about it is slow in this binary coinflip problem compared to pymc3. I just wonder whether there are vectorized version were allowed or documented. However I can not find it in the doc or example but find it in the Arxiv paper that I mentioned before. Thank you for the workaround tips, I learned new thing about it.