Help -- can't get Turing to work on a simple model

anprasad · November 1, 2024, 2:13am

Hi everyone,

I’m new to Bayesian inference and Turing, so I realize that this is likely user error more than anything. I’m trying to fit a simple model; I’ve tried two versions here

@model function model1(samples)
    V0 = 100
    σ ~ truncated(Normal(0.01 * V0, 0.15 * V0), 0, Inf)
    C0 ~ truncated(Normal(100, 60), 0, Inf)
    V ~ filldist(truncated(Normal(V0, σ), 0, Inf), length(samples))

    for i in eachindex(samples)
        samples[i] ~ Poisson(C0 * V[i])
    end
end

@model function model2(samples)
    V0 = 100
    σ ~ truncated(Normal(0.01 * V0, 0.15 * V0), 0, Inf)
    C0 ~ truncated(Normal(100, 60), 0, Inf)
    V ~ filldist(truncated(Normal(V0, σ), 0, Inf), length(samples))

    for i in eachindex(samples)
        samples[i] ~ truncated(Normal(C0 * V[i], sqrt(C0* V[i])), 0, Inf)
    end
end

I can barely get Turing to sample the posterior. NUTS complains about not finding valid initial parameters, but providing it with the MLE estimates eventually results in a domain error. PG works ok, but you barely see any sampling happen and it is very slow.

On the other hand, Stan seems to work right out of the box with following translated code

data {
    int<lower=1> N;                 // number of samples
    array[N] real samples;           // observed samples
}

parameters {
    real<lower=0> sigma;         // standard deviation parameter, constrained to be non-negative
    real<lower=0> C0;            // mean parameter, constrained to be non-negative
    array[N] real<lower=0> V;    // intermediate variable, constrained to be non-negative
}

model {
    real V0 = 100;               // initial mean for V

    // Priors
    sigma ~ normal(0.01 * V0, 0.15 * V0);
    C0 ~ normal(100, 60);
    V ~ normal(V0, sigma);       // implicit truncation to positive values due to <lower=0> constraint

    // Likelihood
    for (i in 1:N) {
        samples[i] ~ normal(C0 * V[i], sqrt(C0 * V[i]));
    }
}

I was hoping someone could help me figure out how to get the Turing code to work

penelopeysm · November 4, 2024, 7:36pm

Hello @anprasad!

I don’t think this is user error at all! It is a bug Specifically, it is a bug with automatic differentiation. If you use a gradientless sampler (e.g. MH()) it runs fine.

Here’s a simplified example, which illustrates the bug in that the gradients contain NaN’s, and unfortunately none of our supported AD backends are happy with this:

using DynamicPPL: @model, LogDensityFunction
using Distributions
using LogDensityProblems: logdensity_and_gradient
using LogDensityProblemsAD: ADgradient

@model function model1()
    σ ~ InverseGamma(2, 3)
    V ~ truncated(Normal(0, σ), 0, Inf)
end

import ForwardDiff
ℓ = ADgradient(:ForwardDiff, LogDensityFunction(model1()))
logdensity_and_gradient(ℓ, [1.0, 2.0])
# (-3.0285667753085077, [NaN, NaN])

import Mooncake
ℓ = ADgradient(:Mooncake, LogDensityFunction(model1()))
logdensity_and_gradient(ℓ, [1.0, 2.0, 3.0])
# (-3.0285667753085077, [NaN, -2.0])

import ReverseDiff
ℓ = ADgradient(:ReverseDiff, LogDensityFunction(model1()))
logdensity_and_gradient(ℓ, [1.0, 2.0, 3.0])
# (-3.0285667753085077, [NaN, -2.0])

import Zygote
ℓ = ADgradient(:Zygote, LogDensityFunction(model1()))
logdensity_and_gradient(ℓ, [1.0, 2.0, 3.0])
# (-3.0285667753085077, [NaN, -2.0])

However: for some reason which is yet unclear, it works when I replace

truncated(Normal(a, b), 0, Inf)

with

truncated(Normal(a, b); lower=0)

in your model. This has the same meaning, but somehow the AD packages are happy with it. And as it happens, this change only needs to be made in the definition of V:

using Turing

@model function model1(samples)
    V0 = 100
    σ ~ truncated(Normal(0.01 * V0, 0.15 * V0), 0, Inf)
    C0 ~ truncated(Normal(100, 60), 0, Inf)
    V ~ filldist(truncated(Normal(V0, σ); lower=0), length(samples))
    for i in eachindex(samples)
        samples[i] ~ Poisson(C0 * V[i])
    end
end

samples = [1, 2, 3]

sample(model1(samples), NUTS(), 1000)

So perhaps that could be a good enough workaround while we try to get the AD bugs fixed?

If it makes a difference, I’m running this on macOS Julia 1.11.1, with Turing v0.35.1.

Thank you for posting this!

penelopeysm · November 4, 2024, 8:38pm

See Sampling with HMC can hang if AD is bugged · Issue #2389 · TuringLang/Turing.jl · GitHub for more info – as it happens it’s generally better to use truncated(d; lower=0) rather than Infs

anprasad · November 6, 2024, 12:33am

Thanks for your quick and detailed response @penelopeysm! I also appreciate your showing me how to debug this problem.

It would be great if a warning like this were included in the docs. Currently, the docs use the 0, Inf versions of the truncated distribution, e.g.,

@model function tmodel(x, y)
    params ~ filldist(truncated(Normal(), 0, Inf), size(x, 2))
    a = x * params
    y ~ MvNormal(a, I)
end

penelopeysm · November 6, 2024, 7:31am

Ooh, I’ll get that changed.

PTWaade · March 19, 2025, 10:08am

Just chiming in here, having also read through the related issue. I understand that the loop is caused by the AD backends returning NaN gradients, and in that sense is not a Turing problem; but from a user perspective, it can be difficult to figure out why the sampling just hangs and then errors. I thought my model was mis-specified, and spent a long time trying to figure out what was wrong before I realized it was due to NaN gradients.

Perhaps Turing can include a brief warning or mention when it errors after a 1000 tries, if it is the case that the logprobs were all NaN ?

wsmoses · March 19, 2025, 2:50pm

Possibly related, Enzyme has an extremely nice feature Enzyme.Compiler.CheckNaN = true, which you will give you a backtrace the first time a nan is produced during the derivative.

Topic		Replies	Views
Can Turing differentiate a truncated Poisson General Usage turing , distributions , autodiff	1	66	February 19, 2025
Turing.jl is not working General Usage	2	242	April 2, 2024
Turing: selective truncated distribution error Probabilistic Programming turing , distributions , autodiff	1	143	September 12, 2024
Error due to Dirichlet prior in Turing? Probabilistic Programming turing	9	292	June 16, 2023
Help with first non-trivial Turing example Statistics	17	1304	May 7, 2020

Help -- can't get Turing to work on a simple model

Related topics