Hi:
In an effort to learn Turing, I set out to replicate the UK Coal Mining disasters example. I think I’ve managed to successfully replicate the results - this is what I did.
using Turing
using Distributions
using Distributed
using RollingFunctions
using MCMCChains, Plots, StatsPlots
use_interp_data = false
if use_interp_data
disasters = [
4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
2, 2, 3, 4, 2, 1, 3, missing, 2, 1, 1, 1, 1, 3, 0, 0,
1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
3, 3, 1, missing, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1
]
else
disasters = [
4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
2, 2, 3, 4, 2, 1, 3, 0, 2, 1, 1, 1, 1, 3, 0, 0,
1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
3, 3, 1, 0, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1
]
end
years = collect(1851:1961)
scatter(years, disasters, markertype=:circle, markersize=8,
alpha=0.4, xlabel="Year", legend=:false, ylabel="Disaster count")
@model function disasters_model(x, y)
s ~ Uniform(1851, 1961)
λ₀ ~ Exponential(1.)
λ₁ ~ Exponential(1.)
λ = map(z -> (z <= s) ? λ₀ : λ₁, x)
y .~ Poisson.(λ)
end
n_samples = 2000
n_chains = 4
delta = .80
n_adapt = 1000
config = NUTS(n_adapt, delta)
addprocs(4)
@time chain = sample(disasters_model(years, disasters),
config, MCMCThreads(), n_samples, n_chains)
The results are below:
┌ Info: Found initial step size
└ ϵ = 0.4
┌ Info: Found initial step size
└ ϵ = 0.2
┌ Info: Found initial step size
└ ϵ = 0.2
┌ Info: Found initial step size
└ ϵ = 0.2
129.208340 seconds (816.63 M allocations: 143.230 GiB, 33.76% gc time)
Chains MCMC chain (2000×15×4 Array{Float64,3}):
Iterations = 1:2000
Thinning interval = 1
Chains = 1, 2, 3, 4
Samples per chain = 2000
parameters = s, λ₀, λ₁
internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth
Summary Statistics
parameters mean std naive_se mcse ess rhat
Symbol Float64 Float64 Float64 Float64 Float64 Float64
s 1903.7427 24.8248 0.2775 2.7897 16.0908 15.9049
λ₀ 2.7723 0.5560 0.0062 0.0590 19.1846 2.2552
λ₁ 0.7362 0.2725 0.0030 0.0287 18.6696 2.6390
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
s 1886.1786 1889.0095 1889.9111 1908.5791 1947.3571
λ₀ 1.7485 2.3498 2.8690 3.1567 3.6972
λ₁ 0.1467 0.6906 0.8224 0.9219 1.0905
I have a couple of questions for this forum:
- is the code structure efficient? the PyMC3 example runs in roughly 17 seconds versus the 130 seconds here.
- how does one deal with the missing values in the data; I couldn’t figure out how to do it so ended up replacing them with 0 (I wouldn’t do that in real life but just wanted to get something going).
- any other comments?
Thanks for your help.