Hi:
In an effort to learn Turing, I set out to replicate the UK Coal Mining disasters example. I think I’ve managed to successfully replicate the results - this is what I did.
using Turing
using Distributions
using Distributed 
using RollingFunctions
using MCMCChains, Plots, StatsPlots
use_interp_data = false
if use_interp_data 
    disasters = [
        4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
        3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
        2, 2, 3, 4, 2, 1, 3, missing, 2, 1, 1, 1, 1, 3, 0, 0,
        1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
        0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
        3, 3, 1, missing, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1
    ]
else 
    disasters = [
        4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
        3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
        2, 2, 3, 4, 2, 1, 3, 0, 2, 1, 1, 1, 1, 3, 0, 0,
        1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
        0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
        3, 3, 1, 0, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1
    ]
end
years = collect(1851:1961)
scatter(years, disasters, markertype=:circle, markersize=8, 
    alpha=0.4, xlabel="Year", legend=:false, ylabel="Disaster count")
@model function disasters_model(x, y)
    s ~ Uniform(1851, 1961)
    λ₀ ~ Exponential(1.)
    λ₁ ~ Exponential(1.)
    λ = map(z -> (z <= s) ? λ₀ : λ₁, x)
    y .~ Poisson.(λ)
end
n_samples = 2000
n_chains = 4
delta = .80
n_adapt = 1000
config = NUTS(n_adapt, delta)
addprocs(4)
@time chain = sample(disasters_model(years, disasters), 
    config, MCMCThreads(), n_samples, n_chains)
The results are below:
┌ Info: Found initial step size
└   ϵ = 0.4
┌ Info: Found initial step size
└   ϵ = 0.2
┌ Info: Found initial step size
└   ϵ = 0.2
┌ Info: Found initial step size
└   ϵ = 0.2
129.208340 seconds (816.63 M allocations: 143.230 GiB, 33.76% gc time)
Chains MCMC chain (2000×15×4 Array{Float64,3}):
Iterations        = 1:2000
Thinning interval = 1
Chains            = 1, 2, 3, 4
Samples per chain = 2000
parameters        = s, λ₀, λ₁
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth
Summary Statistics
  parameters        mean       std   naive_se      mcse       ess      rhat 
      Symbol     Float64   Float64    Float64   Float64   Float64   Float64 
           s   1903.7427   24.8248     0.2775    2.7897   16.0908   15.9049
          λ₀      2.7723    0.5560     0.0062    0.0590   19.1846    2.2552
          λ₁      0.7362    0.2725     0.0030    0.0287   18.6696    2.6390
Quantiles
  parameters        2.5%       25.0%       50.0%       75.0%       97.5% 
      Symbol     Float64     Float64     Float64     Float64     Float64 
           s   1886.1786   1889.0095   1889.9111   1908.5791   1947.3571
          λ₀      1.7485      2.3498      2.8690      3.1567      3.6972
          λ₁      0.1467      0.6906      0.8224      0.9219      1.0905
I have a couple of questions for this forum:
- is the code structure efficient? the PyMC3 example runs in roughly 17 seconds versus the 130 seconds here.
- how does one deal with the missing values in the data; I couldn’t figure out how to do it so ended up replacing them with 0 (I wouldn’t do that in real life but just wanted to get something going).
- any other comments?
Thanks for your help.