Help -- can't get Turing to work on a simple model

Hi everyone,

I’m new to Bayesian inference and Turing, so I realize that this is likely user error more than anything. I’m trying to fit a simple model; I’ve tried two versions here

@model function model1(samples)
    V0 = 100
    σ ~ truncated(Normal(0.01 * V0, 0.15 * V0), 0, Inf)
    C0 ~ truncated(Normal(100, 60), 0, Inf)
    V ~ filldist(truncated(Normal(V0, σ), 0, Inf), length(samples))

    for i in eachindex(samples)
        samples[i] ~ Poisson(C0 * V[i])
    end
end

@model function model2(samples)
    V0 = 100
    σ ~ truncated(Normal(0.01 * V0, 0.15 * V0), 0, Inf)
    C0 ~ truncated(Normal(100, 60), 0, Inf)
    V ~ filldist(truncated(Normal(V0, σ), 0, Inf), length(samples))

    for i in eachindex(samples)
        samples[i] ~ truncated(Normal(C0 * V[i], sqrt(C0* V[i])), 0, Inf)
    end
end

I can barely get Turing to sample the posterior. NUTS complains about not finding valid initial parameters, but providing it with the MLE estimates eventually results in a domain error. PG works ok, but you barely see any sampling happen and it is very slow.

On the other hand, Stan seems to work right out of the box with following translated code

data {
    int<lower=1> N;                 // number of samples
    array[N] real samples;           // observed samples
}

parameters {
    real<lower=0> sigma;         // standard deviation parameter, constrained to be non-negative
    real<lower=0> C0;            // mean parameter, constrained to be non-negative
    array[N] real<lower=0> V;    // intermediate variable, constrained to be non-negative
}

model {
    real V0 = 100;               // initial mean for V

    // Priors
    sigma ~ normal(0.01 * V0, 0.15 * V0);
    C0 ~ normal(100, 60);
    V ~ normal(V0, sigma);       // implicit truncation to positive values due to <lower=0> constraint

    // Likelihood
    for (i in 1:N) {
        samples[i] ~ normal(C0 * V[i], sqrt(C0 * V[i]));
    }
}

I was hoping someone could help me figure out how to get the Turing code to work

Hello @anprasad!

I don’t think this is user error at all! It is a bug :scream: Specifically, it is a bug with automatic differentiation. If you use a gradientless sampler (e.g. MH()) it runs fine.

Here’s a simplified example, which illustrates the bug in that the gradients contain NaN’s, and unfortunately none of our supported AD backends are happy with this:

using DynamicPPL: @model, LogDensityFunction
using Distributions
using LogDensityProblems: logdensity_and_gradient
using LogDensityProblemsAD: ADgradient

@model function model1()
    σ ~ InverseGamma(2, 3)
    V ~ truncated(Normal(0, σ), 0, Inf)
end

import ForwardDiff
ℓ = ADgradient(:ForwardDiff, LogDensityFunction(model1()))
logdensity_and_gradient(ℓ, [1.0, 2.0])
# (-3.0285667753085077, [NaN, NaN])

import Mooncake
ℓ = ADgradient(:Mooncake, LogDensityFunction(model1()))
logdensity_and_gradient(ℓ, [1.0, 2.0, 3.0])
# (-3.0285667753085077, [NaN, -2.0])

import ReverseDiff
ℓ = ADgradient(:ReverseDiff, LogDensityFunction(model1()))
logdensity_and_gradient(ℓ, [1.0, 2.0, 3.0])
# (-3.0285667753085077, [NaN, -2.0])

import Zygote
ℓ = ADgradient(:Zygote, LogDensityFunction(model1()))
logdensity_and_gradient(ℓ, [1.0, 2.0, 3.0])
# (-3.0285667753085077, [NaN, -2.0])

However: for some reason which is yet unclear, it works when I replace

truncated(Normal(a, b), 0, Inf)

with

truncated(Normal(a, b); lower=0)

in your model. This has the same meaning, but somehow the AD packages are happy with it. And as it happens, this change only needs to be made in the definition of V:

using Turing

@model function model1(samples)
    V0 = 100
    σ ~ truncated(Normal(0.01 * V0, 0.15 * V0), 0, Inf)
    C0 ~ truncated(Normal(100, 60), 0, Inf)
    V ~ filldist(truncated(Normal(V0, σ); lower=0), length(samples))
    for i in eachindex(samples)
        samples[i] ~ Poisson(C0 * V[i])
    end
end

samples = [1, 2, 3]

sample(model1(samples), NUTS(), 1000)

So perhaps that could be a good enough workaround while we try to get the AD bugs fixed? :smile:

If it makes a difference, I’m running this on macOS Julia 1.11.1, with Turing v0.35.1.

Thank you for posting this!

2 Likes

See Sampling with HMC can hang if AD is bugged · Issue #2389 · TuringLang/Turing.jl · GitHub for more info – as it happens it’s generally better to use truncated(d; lower=0) rather than Infs :slight_smile:

Thanks for your quick and detailed response @penelopeysm! I also appreciate your showing me how to debug this problem.

It would be great if a warning like this were included in the docs. Currently, the docs use the 0, Inf versions of the truncated distribution, e.g.,

@model function tmodel(x, y)
    params ~ filldist(truncated(Normal(), 0, Inf), size(x, 2))
    a = x * params
    y ~ MvNormal(a, I)
end

Ooh, I’ll get that changed.

1 Like