Poisson changepoint model: UK Coal Mining Disasters

Hi:

In an effort to learn Turing, I set out to replicate the UK Coal Mining disasters example. I think I’ve managed to successfully replicate the results - this is what I did.

using Turing
using Distributions
using Distributed 
using RollingFunctions

using MCMCChains, Plots, StatsPlots

use_interp_data = false
if use_interp_data 
    disasters = [
        4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
        3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
        2, 2, 3, 4, 2, 1, 3, missing, 2, 1, 1, 1, 1, 3, 0, 0,
        1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
        0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
        3, 3, 1, missing, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1
    ]
else 
    disasters = [
        4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
        3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
        2, 2, 3, 4, 2, 1, 3, 0, 2, 1, 1, 1, 1, 3, 0, 0,
        1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
        0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
        3, 3, 1, 0, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1
    ]
end

years = collect(1851:1961)

scatter(years, disasters, markertype=:circle, markersize=8, 
    alpha=0.4, xlabel="Year", legend=:false, ylabel="Disaster count")


@model function disasters_model(x, y)
    s ~ Uniform(1851, 1961)
    λ₀ ~ Exponential(1.)
    λ₁ ~ Exponential(1.)

    λ = map(z -> (z <= s) ? λ₀ : λ₁, x)
    y .~ Poisson.(λ)
end

n_samples = 2000
n_chains = 4
delta = .80
n_adapt = 1000
config = NUTS(n_adapt, delta)

addprocs(4)

@time chain = sample(disasters_model(years, disasters), 
    config, MCMCThreads(), n_samples, n_chains)

The results are below:

┌ Info: Found initial step size
└   ϵ = 0.4
┌ Info: Found initial step size
└   ϵ = 0.2
┌ Info: Found initial step size
└   ϵ = 0.2
┌ Info: Found initial step size
└   ϵ = 0.2
129.208340 seconds (816.63 M allocations: 143.230 GiB, 33.76% gc time)
Chains MCMC chain (2000×15×4 Array{Float64,3}):

Iterations        = 1:2000
Thinning interval = 1
Chains            = 1, 2, 3, 4
Samples per chain = 2000
parameters        = s, λ₀, λ₁
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth

Summary Statistics
  parameters        mean       std   naive_se      mcse       ess      rhat 
      Symbol     Float64   Float64    Float64   Float64   Float64   Float64 

           s   1903.7427   24.8248     0.2775    2.7897   16.0908   15.9049
          λ₀      2.7723    0.5560     0.0062    0.0590   19.1846    2.2552
          λ₁      0.7362    0.2725     0.0030    0.0287   18.6696    2.6390

Quantiles
  parameters        2.5%       25.0%       50.0%       75.0%       97.5% 
      Symbol     Float64     Float64     Float64     Float64     Float64 

           s   1886.1786   1889.0095   1889.9111   1908.5791   1947.3571
          λ₀      1.7485      2.3498      2.8690      3.1567      3.6972
          λ₁      0.1467      0.6906      0.8224      0.9219      1.0905

I have a couple of questions for this forum:

  • is the code structure efficient? the PyMC3 example runs in roughly 17 seconds versus the 130 seconds here.
  • how does one deal with the missing values in the data; I couldn’t figure out how to do it so ended up replacing them with 0 (I wouldn’t do that in real life but just wanted to get something going).
  • any other comments?

Thanks for your help.

Not sure of Turing specifics here, but map will allocate at each iteration, leading to lots of overhead. You might try a MappedArray here:


julia> using MappedArrays, BenchmarkTools

julia> x = randn(10000);

julia> @btime map(z -> (z ≤ 0.5 ? 2.0 : 3.0), $x);
  7.106 μs (2 allocations: 78.20 KiB)

julia> @btime mappedarray(z -> (z ≤ 0.5 ? 2.0 : 3.0), $x);
  1.182 ns (0 allocations: 0 bytes)

Uh, why is there missing data? Is this something added for education purposes, I think I remember there are no missing years in the original data set https://academic.oup.com/biomet/article/66/1/191/224063

Good question - I don’t have access to Biometrika so will have to take your word for it. I simply replicated the PyMC3 example dataset for this purpose.

Yes, I see that its about Turing and not about coal mines, I was just puzzled. We had a look at the same data set for PointProcessInference.jl, but there we use event times* instead of aggregate events.

*from R:

> install.packages("boot")
> library(boot)
> coal

You can get it from RDatasets:

julia> using RDatasets

julia> coal = dataset("boot","coal")
191×1 DataFrame
 Row │ Date    
     │ Float64 
─────┼─────────
   1 │ 1851.2
   2 │ 1851.63
   3 │ 1851.97
   4 │ 1851.97
   5 │ 1852.31
   6 │ 1852.35
   7 │ 1852.36
...
1 Like

Thanks for the suggestion. That brought it down to roughly 85 seconds, substantial progress.

1 Like

I didn’t know about the MappedArray library, thanks!

So to be explicit, the difference between array and mappedarray in terms of performance here is that on every iteration calling map will explicitly create a new array thus requiring memory allocation of that array, and then for lookup each z value one by one when sampling the line y .~ Poisson.(\lambda). So, array allocations = O(n_samples), function calls = O(n_samples * len(y)). I suppose there’s also garbage collection for each of those n_samples arrays.

In contrast, mapped array skips the allocation step (and hence garbage collection) and just calculates z when each element is accessed. Same number of function calls in each case since, but the array allocation is skipped entirely.

Is there also something else going on here though? Some naive back-of-the-envelop math: you showed you can save a few microseconds allocating an array of size 10k. Accumulated over 2000 samples (ignoring that this is much smaller data), that adds up to only a handful of milliseconds. Even if that isn’t counting gc, that will still be far less than the 45 seconds claimed here. Obviously different machines but still a big gap. Am I missing something?

1 Like

Right, this is because a MappedArray is lazy. The comparison I showed doesn’t include actually computing the elements. So maybe this is better:

julia> @btime begin
       y = map(z -> (z ≤ 0.5 ? 2.0 : 3.0), $x)
       sum(y)
       end
  5.884 μs (2 allocations: 78.20 KiB)
23126.0

julia> @btime begin
       y = mappedarray(z -> (z ≤ 0.5 ? 2.0 : 3.0), $x)
       sum(y)
       end
  1.534 μs (0 allocations: 0 bytes)
23126.0

Also, note that it’s not always a win. If the computed values are accessed multiple times then it’s often better to allocate.

Can both those lines be combined in a single mappedarray to avoid the allocation alltogether? Does Turing buy y ~ mappedarray?

Good point. Maybe this?

y .~ mappedarray(z -> (z <= s) ? Poisson(λ₀) : Poisson(λ₁), x)