Error due to Dirichlet prior in Turing?

JanG · June 12, 2023, 6:34pm

Hi,

This model below crashes for me with the error message LoadError: DomainError with Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}}(NaN,NaN,NaN): Bernoulli: the condition zero(p) <= p <= one(p) is not satisfied.

using Turing, Random

Random.seed!(123)
@model function model(N = 1000, K = 2, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
    x ~ Dirichlet(ones(K))
    for n in 1:N
        y[n] ~ Bernoulli(x[g1[n]]/(x[g1[n]]+x[g2[n]]))
    end
end

m = model()
chn = sample(m, NUTS(), 100)

Doing something like

p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
if p >= 1.0
    p = 1.0
elseif p <= 0.0
    p = 0.0
end
y[n] ~ Bernoulli(p)

doesn’t fix the issue. I suspect this is due to x ~ Dirichlet(ones(K)) as the model works if I replace that line with x ~ filldist(Uniform(0,1),K). Does anybody have an idea what could be causing this?

Thanks in advance!

dlakelan · June 12, 2023, 6:36pm

Possibly an initialization issue? If you provide valid initial conditions does it go at all?

JanG · June 12, 2023, 6:53pm

Setting the initial conditions did indeed help for this seed but I can still find examples that don’t work. I.e. here’s an example that still doesn’t work (I’ve now set K=9)

using Turing, Random

Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
    x ~ Dirichlet(ones(K))
    # x ~ filldist(Uniform(0,1),K)
    for n in 1:N
        y[n] ~ Bernoulli(x[g1[n]]/(x[g1[n]]+x[g2[n]]))
    end
end

m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)

Dan · June 12, 2023, 11:03pm

Using a different sampler instead of NUTS it works without the error:

chn = sample(m, HMC(0.01, 10), 100)

I think the problem is indeed with leaving the domain of Dirichlet distribution by the sampler. The solution of clamping to 0.0 - 1.0 isn’t enough as Dirichlet at 0.0 / 1.0 is an edge case with problematic derivative. Perhaps clamping to epsilon and 1-epsilon would resolve the issue.

For example:

p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
pc = max(min(p,1.0-0.0001),0.0001)
y[n] ~ Bernoulli(pc)

allow completion of NUTS sampler with:

chn = sample(m, NUTS(), 100)

as was crashing in the OP.

JanG · June 13, 2023, 3:55pm

Thanks for pointing out that it works with other samplers, that’s helpful.

I’m not sure clamping helps with NUTS though. Even when I set epsilon = 0.1, it still crashes for the second example I posted:

using Turing, Random

Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
    x ~ Dirichlet(ones(K))
    # x ~ filldist(Uniform(0,1),K)
    for n in 1:N
        p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
        pc = max(min(p,1.0-0.1),0.1)
        y[n] ~ Bernoulli(pc)
    end
end

m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)

JanG · June 13, 2023, 4:14pm

I think it might have something to do with p being NaN. The code

using Turing, Random

Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
    x ~ Dirichlet(ones(K))
    # x ~ filldist(Uniform(0,1),K)
    for n in 1:N
        p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
        if isnan(p)
            p = 0.000001
        end
        y[n] ~ Bernoulli(p)
    end
end

m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)

runs without issues. I’m not quite sure where the NaNs are coming from though. I thought dividing by 0 produces Inf not NaN? Perhaps a very small denominator somehow causes NaNs when using NUTS?

edit: 0/0 is NaN, so it must be related to that!

Dan · June 13, 2023, 6:28pm

I think (-Inf) - (-Inf) is NaN , and logpdf outside Dirichlet distribution support is -Inf and this is how the NaNs creep in. There is not enough finiteness checking when calculating the gradient of the logpdf.
There is an issue and a fix somewhere here, probably in AdvancedMHC.jl but haven’t zeroed in on it.

JanG · June 14, 2023, 12:06am

I still think it’s the 0/0 issue; when I examine the values of x[g1[n]] and x[g2[n]] for cases where p is NaN, then they are both 0.0. The way to clamp this model successfully is, I think, something like this:

using Turing, Random

Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
    x ~ Dirichlet(ones(K))
    # x ~ filldist(Uniform(0,1),K)
    for n in 1:N
        a = max(x[g1[n]],0.00001)
        b = max(x[g2[n]],0.00001)
        y[n] ~ Bernoulli(a/(a+b))
    end
end

m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)

sethaxen · June 15, 2023, 1:17pm

Without looking into this in any detail, I’d guess the issue is the one solved by Make SimplexBijector actually bijective by sethaxen · Pull Request #263 · TuringLang/Bijectors.jl · GitHub. In short, how Turing currently unconstrains the simplex for sampling with gradient-based samplers makes the posterior improper. Technically the posterior variance for a single sampled parameter is infinite, so ironically, if warm-up works really well, that parameter will eventually overflow. This should be fixed soon.

dlakelan · June 16, 2023, 11:56am

Yikes, and fabulous at the same time!

Topic		Replies	Views
Turing.jl - Binomial ArgumentError zero(p) <= p <= one(p) is not satisfied Probabilistic programming turing	9	1181	April 17, 2020
Constrained parameter in Turing Probabilistic programming turing	8	533	July 1, 2021
Turing Mixture Models with Dirichlet weightings Probabilistic programming turing	8	376	April 25, 2024
Help -- can't get Turing to work on a simple model Probabilistic programming turing , stan	4	151	November 6, 2024
The Bayesian SDE in the Turing tutorial issues an error Probabilistic programming diffeq , sde , turing	21	1440	May 3, 2022

Error due to Dirichlet prior in Turing?

Related topics