Turing.jl - NUTS gets stuck in "The current proposal will be rejected... isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true)"

I am trying to implement a model similar to Maas et al 2011 - equation 12 and 13 (details of my implementation below).

The issue I am facing is that I am not able to use NUTS with this model. I get repeated numerical instability error and the code just keeps running for hours (It feels the code is stuck in a loop). If I use PG(50), it takes 5-6 mins for the model to sample 4000 samples in 4 chains, but the samples just sticks closer to the prior distribution. Hence the posterior predictive distribution is pretty different than observed data. I tried MH() too, but it keeps sampling the same value. Hence, I thought to use NUTS to get better samples, but I keep getting below error. Please advise what may I be doing wrong?


┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true)
└ @ AdvancedHMC C:\Users…julia\packages\AdvancedHMC\51xgc\src\hamiltonian.jl:47


@model model1(k,j, RTₖⱼ=missing) = begin
aⁱ ~ filldist(Uniform(0.01,0.5),1,j)
vⁱ ~ filldist(Uniform(0.01,100),1,j)
aᵖ ~ filldist(LogNormal(0,1),k,1)
vᵖ ~ filldist(LogNormal(0,1),k,1)
tᵉʳ ~ filldist(LogNormal(0,1),k)
μₖⱼ, σ²ₖⱼ = model_algo(aⁱ, aᵖ, vⁱ, vᵖ, tᵉʳ)
RTₖⱼ ~ arraydist(Normal.(μₖⱼ, σ²ₖⱼ))
return RTₖⱼ

The model_algo() implements below calculations to get the μₖⱼ, σ²ₖⱼ parameters

Also, is there a way I can debug and see what data/calculation while running NUTS is causing the issue?

Sometimes NUTS runs into problems if the model does not fit the data well. Are you using simulated data from the model or empirical data?

Another potential issue is the range of v:

vⁱ ~ filldist(Uniform(0.01,100),1,j)

Would it be reasonable to use a smaller range?

One of the simplest ways to hone in on the problem is print the parameter values inside the model block.

Thanks @Christopher_Fisher for the response. Here are couple of things I tried

Earlier I was using the data from Maas et al 2011. But now I tested the model using the sample generated by the model. Also, I changed my vⁱ prior to reduce the range. I also reduced the number of samples to 1 and printed the variables to see the error.

sample_data = Qdiffusiontest(5,2,missing)()
sample(Qdiffusiontest(5,2,sample_data), NUTS(1,0.65), 1);

vⁱ ~ filldist(Uniform(0.2,1),1,j)

The error shows that σ²ₖⱼ is becoming zero. I think that must mean (based on equation 2 in my implementation) that my var(RT) becomes 0 which could be due to aᵖ or vⁱ being sampled as 0.

I printed the logs and found that aᵖ is being sampled as 0s and NaNs.
My aᵖ prior is:

aᵖ ~ filldist(LogNormal(0,1),k,1)

To trying fixing the 0 issue, I changed the σ²ₖⱼ formula to log (1.001 + …) and issue seems to be fixed for 0 case. But I am not sure why NaNs are being sampled for aᵖ and how to fix that? Please advise.

Error logs for aᵖ only:

ForwardDiff.Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:aⁱ, :vⁱ, :aᵖ, :vᵖ, :tᵉʳ), Tuple

AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(0.386165161519008,0.0,0.0,0.0,0.0,0.386165161519008,0.0,0.0,0.0,0.0,0.0);

Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:aⁱ, :vⁱ, :aᵖ, :vᵖ, :tᵉʳ), Tuple{Dy

(0.04060390596442086,0.0,0.0,0.0,0.0,0.0,0.04060390596442086,0.0,0.0,0.0,0.0); Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:aⁱ, :vⁱ, :aᵖ, :vᵖ, :tᵉʳ), Tuple{Dy

(0.5975391355984172,0.0,0.0,0.0,0.0,0.0,0.0,0.5975391355984172,0.0,0.0,0.0); Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:aⁱ, :vⁱ, :aᵖ, :vᵖ, :tᵉʳ), Tuple{Dyn

(Inf,NaN,NaN,NaN,NaN,NaN,NaN,NaN,Inf,NaN,NaN); Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:aⁱ, :vⁱ, :aᵖ, :vᵖ, :tᵉʳ), Tuple{Dyn


ERROR: DomainError with Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:aⁱ, :vⁱ, :aᵖ, :vᵖ, :tᵉʳ), Tuple{Dyn


Normal: the condition σ >= zero(σ) is not satisfied.

Indeed, that its very strange. I wonder what happens if you reject samples of aᵖ which are close to zero?

The documentation shows that you can reject a sample as follows:

using Turing
using LinearAlgebra

@model function demo(x)
    m ~ MvNormal(zero(x), I)
    if dot(m, x) < 0
        Turing.@addlogprob! -Inf
        # Exit the model evaluation early

    x ~ MvNormal(m, I)

My suspicion is that if the sampler gets stuck at the boundary you set for rejecting samples (say .0001) , there might be challenging trade-offs due to the ratio of parameters in the definitions of the expected and variance of RT.

Thanks @Christopher_Fisher for the suggestion. I tried it out and it helps in the sense that the code is now sampling, but it’s sampling pretty slowly. Like really slow, may be it’s rejecting a lot of samples. Also, it’s sampling for the data that I generated from the model itself (prior predictive). But for my observed data, it’s still gets stuck in “The current proposal will be rejected…” error. I made sure that my observed data don’t have any NaN / missing.

I implemented the same code in PyMC and it sampled successfully using NUTS conditioned on my observed data. The posterior stats look great. When I tried to figure out why, I realized that PyMC transformed all my variables to have (-inf, inf) bounds and hence the sampling happened successfully. If I switch off this transform feature, I start to get similar errors as Turing NUTS.

So, is there a tutorial that I can refer to see how transformation can happen in Turing. I think it’s Bijectors.jl, but I have no experience in how transformation works, hence I am not sure if I am thinking about this correctly. If possible, please point me to a documentation / tutorial where I can understand how to do the transformation (similar to PyMC) in Turing.jl.

Thanks for all the help!

@rkmalaiya, thanks for reporting back. The transformation PyMC3 uses is interesting. I would expect that sampling would speed up if the transformation makes the posterior more similar to a multivariate normal. A while back, I used Bijectors.jl outside of Turing. However, I’m not sure how to use it in Turing for your specific use case.

Another factor is the AD backend. ReverseDiff is currently the fastest once the number of parameters is about 10 or greater (which is probably true for an IRT type of model). If you use the Multivariate Normal as the likelihood with ReverseDiff.jl, you should get a significant speed up.