Error due to Dirichlet prior in Turing?

Hi,

This model below crashes for me with the error message LoadError: DomainError with Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}}(NaN,NaN,NaN): Bernoulli: the condition zero(p) <= p <= one(p) is not satisfied.

using Turing, Random

Random.seed!(123)
@model function model(N = 1000, K = 2, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
    x ~ Dirichlet(ones(K))
    for n in 1:N
        y[n] ~ Bernoulli(x[g1[n]]/(x[g1[n]]+x[g2[n]]))
    end
end

m = model()
chn = sample(m, NUTS(), 100)

Doing something like

p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
if p >= 1.0
    p = 1.0
elseif p <= 0.0
    p = 0.0
end
y[n] ~ Bernoulli(p)

doesn’t fix the issue. I suspect this is due to x ~ Dirichlet(ones(K)) as the model works if I replace that line with x ~ filldist(Uniform(0,1),K). Does anybody have an idea what could be causing this?

Thanks in advance!

Possibly an initialization issue? If you provide valid initial conditions does it go at all?

Setting the initial conditions did indeed help for this seed but I can still find examples that don’t work. I.e. here’s an example that still doesn’t work (I’ve now set K=9)

using Turing, Random

Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
    x ~ Dirichlet(ones(K))
    # x ~ filldist(Uniform(0,1),K)
    for n in 1:N
        y[n] ~ Bernoulli(x[g1[n]]/(x[g1[n]]+x[g2[n]]))
    end
end

m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)

Using a different sampler instead of NUTS it works without the error:

chn = sample(m, HMC(0.01, 10), 100)

I think the problem is indeed with leaving the domain of Dirichlet distribution by the sampler. The solution of clamping to 0.0 - 1.0 isn’t enough as Dirichlet at 0.0 / 1.0 is an edge case with problematic derivative. Perhaps clamping to epsilon and 1-epsilon would resolve the issue.

For example:

p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
pc = max(min(p,1.0-0.0001),0.0001)
y[n] ~ Bernoulli(pc)

allow completion of NUTS sampler with:

chn = sample(m, NUTS(), 100)

as was crashing in the OP.

Thanks for pointing out that it works with other samplers, that’s helpful.

I’m not sure clamping helps with NUTS though. Even when I set epsilon = 0.1, it still crashes for the second example I posted:

using Turing, Random

Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
    x ~ Dirichlet(ones(K))
    # x ~ filldist(Uniform(0,1),K)
    for n in 1:N
        p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
        pc = max(min(p,1.0-0.1),0.1)
        y[n] ~ Bernoulli(pc)
    end
end

m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)

I think it might have something to do with p being NaN. The code

using Turing, Random

Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
    x ~ Dirichlet(ones(K))
    # x ~ filldist(Uniform(0,1),K)
    for n in 1:N
        p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
        if isnan(p)
            p = 0.000001
        end
        y[n] ~ Bernoulli(p)
    end
end

m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)

runs without issues. I’m not quite sure where the NaNs are coming from though. I thought dividing by 0 produces Inf not NaN? Perhaps a very small denominator somehow causes NaNs when using NUTS?

edit: 0/0 is NaN, so it must be related to that!

I think (-Inf) - (-Inf) is NaN , and logpdf outside Dirichlet distribution support is -Inf and this is how the NaNs creep in. There is not enough finiteness checking when calculating the gradient of the logpdf.
There is an issue and a fix somewhere here, probably in AdvancedMHC.jl but haven’t zeroed in on it.

1 Like

I still think it’s the 0/0 issue; when I examine the values of x[g1[n]] and x[g2[n]] for cases where p is NaN, then they are both 0.0. The way to clamp this model successfully is, I think, something like this:

using Turing, Random

Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
    x ~ Dirichlet(ones(K))
    # x ~ filldist(Uniform(0,1),K)
    for n in 1:N
        a = max(x[g1[n]],0.00001)
        b = max(x[g2[n]],0.00001)
        y[n] ~ Bernoulli(a/(a+b))
    end
end

m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)
1 Like

Without looking into this in any detail, I’d guess the issue is the one solved by Make SimplexBijector actually bijective by sethaxen · Pull Request #263 · TuringLang/Bijectors.jl · GitHub. In short, how Turing currently unconstrains the simplex for sampling with gradient-based samplers makes the posterior improper. Technically the posterior variance for a single sampled parameter is infinite, so ironically, if warm-up works really well, that parameter will eventually overflow. This should be fixed soon.

Yikes, and fabulous at the same time!