Turing.jl Warning: The current proposal will be rejected due to numerical error(s). isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)

I slightly modifies the gdemo example in Turing.jl to accept more data to

# Import packages.
using Turing

# Define a simple Normal model with unknown mean and variance.
@model function gdemo(x)
    s² ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s²))
    x .~ Normal(m, sqrt(s²))
end

and generated some data - actually lots of it - this is key for the observed behaviour.

data = rand(Normal(2,1),10000);

and wanted to sample from the posterior:

c3 = sample(gdemo(data), HMC(0.1, 5), 100)

Which resulted in lots of warning messages:

┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC ~/.julia/packages/AdvancedHMC/4fByY/src/hamiltonian.jl:49
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC ~/.julia/packages/AdvancedHMC/4fByY/src/hamiltonian.jl:49
...

In the end the sampler is stuck in one place:

Chains MCMC chain (100×11×1 Array{Float64, 3}):

Iterations        = 1:1:100
Number of chains  = 1
Samples per chain = 100
Wall duration     = 0.13 seconds
Compute duration  = 0.13 seconds
parameters        = s², m
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse       ess      rhat   es ⋯
      Symbol   Float64   Float64    Float64   Float64   Float64   Float64      ⋯

          s²    2.2691    0.0000     0.0000    0.0000    2.0911    0.9899      ⋯
           m    0.1653    0.0000     0.0000    0.0000       NaN       NaN      ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

          s²    2.2691    2.2691    2.2691    2.2691    2.2691
           m    0.1653    0.1653    0.1653    0.1653    0.1653

What is wrong? Why is Turing performing badly when there is a lot of data?

When you increase the data, often the posterior becomes more narrow, so the size and number of steps needed to cross the posterior will change. So the short answer is that the set of parameters you’re passing to HMC may be appropriate for that specific demo but will be unique to each problem.

A more general answer is, don’t use HMC. Use NUTS. NUTS tunes the step size and number of steps so you don’t have to. Even more importantly, Turing’s HMC implementation by default does not tune a metric, while NUTS tunes a diagonal metric, which would largely account for the change in posterior scale with increasing data.

Effectively, HMC represents decade-old methodologies, while NUTS is modern and fast and one of the key technologies that makes probabilistic programming practical. It’s really unfortunate that the Turing examples feature HMC so much.

3 Likes

Should the examples just be changed? Would it be much harder than just swapping out HMC for NUTS and rerunning everything?

I agree the examples should probably updated, such that the examples generalise better.

NUTS also is HMC it’s just HMC with an auto-tuning algorithm.