This model below crashes for me with the error message LoadError: DomainError with Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}}(NaN,NaN,NaN): Bernoulli: the condition zero(p) <= p <= one(p) is not satisfied.
using Turing, Random
Random.seed!(123)
@model function model(N = 1000, K = 2, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
x ~ Dirichlet(ones(K))
for n in 1:N
y[n] ~ Bernoulli(x[g1[n]]/(x[g1[n]]+x[g2[n]]))
end
end
m = model()
chn = sample(m, NUTS(), 100)
Doing something like
p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
if p >= 1.0
p = 1.0
elseif p <= 0.0
p = 0.0
end
y[n] ~ Bernoulli(p)
doesn’t fix the issue. I suspect this is due to x ~ Dirichlet(ones(K)) as the model works if I replace that line with x ~ filldist(Uniform(0,1),K). Does anybody have an idea what could be causing this?
Setting the initial conditions did indeed help for this seed but I can still find examples that don’t work. I.e. here’s an example that still doesn’t work (I’ve now set K=9)
using Turing, Random
Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
x ~ Dirichlet(ones(K))
# x ~ filldist(Uniform(0,1),K)
for n in 1:N
y[n] ~ Bernoulli(x[g1[n]]/(x[g1[n]]+x[g2[n]]))
end
end
m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)
Using a different sampler instead of NUTS it works without the error:
chn = sample(m, HMC(0.01, 10), 100)
I think the problem is indeed with leaving the domain of Dirichlet distribution by the sampler. The solution of clamping to 0.0 - 1.0 isn’t enough as Dirichlet at 0.0 / 1.0 is an edge case with problematic derivative. Perhaps clamping to epsilon and 1-epsilon would resolve the issue.
For example:
p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
pc = max(min(p,1.0-0.0001),0.0001)
y[n] ~ Bernoulli(pc)
Thanks for pointing out that it works with other samplers, that’s helpful.
I’m not sure clamping helps with NUTS though. Even when I set epsilon = 0.1, it still crashes for the second example I posted:
using Turing, Random
Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
x ~ Dirichlet(ones(K))
# x ~ filldist(Uniform(0,1),K)
for n in 1:N
p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
pc = max(min(p,1.0-0.1),0.1)
y[n] ~ Bernoulli(pc)
end
end
m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)
I think it might have something to do with p being NaN. The code
using Turing, Random
Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
x ~ Dirichlet(ones(K))
# x ~ filldist(Uniform(0,1),K)
for n in 1:N
p = x[g1[n]]/(x[g1[n]]+x[g2[n]])
if isnan(p)
p = 0.000001
end
y[n] ~ Bernoulli(p)
end
end
m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)
runs without issues. I’m not quite sure where the NaNs are coming from though. I thought dividing by 0 produces Inf not NaN? Perhaps a very small denominator somehow causes NaNs when using NUTS?
I think (-Inf) - (-Inf) is NaN , and logpdf outside Dirichlet distribution support is -Inf and this is how the NaNs creep in. There is not enough finiteness checking when calculating the gradient of the logpdf.
There is an issue and a fix somewhere here, probably in AdvancedMHC.jl but haven’t zeroed in on it.
I still think it’s the 0/0 issue; when I examine the values of x[g1[n]] and x[g2[n]] for cases where p is NaN, then they are both 0.0. The way to clamp this model successfully is, I think, something like this:
using Turing, Random
Random.seed!(1)
@model function model(N = 1000, K = 9, y = rand(Bernoulli(0.5),N), g1 = rand(Categorical(K),N), g2 = rand(Categorical(K),N))
x ~ Dirichlet(ones(K))
# x ~ filldist(Uniform(0,1),K)
for n in 1:N
a = max(x[g1[n]],0.00001)
b = max(x[g2[n]],0.00001)
y[n] ~ Bernoulli(a/(a+b))
end
end
m = model()
chn = sample(m, NUTS(), 100, init_params = ones(9)/9)
Without looking into this in any detail, I’d guess the issue is the one solved by Make SimplexBijector actually bijective by sethaxen · Pull Request #263 · TuringLang/Bijectors.jl · GitHub. In short, how Turing currently unconstrains the simplex for sampling with gradient-based samplers makes the posterior improper. Technically the posterior variance for a single sampled parameter is infinite, so ironically, if warm-up works really well, that parameter will eventually overflow. This should be fixed soon.