BetaBinomial: the condition α > zero(α) is not satisfied

Hello,

I have some code where I am trying to model the uncertainty in an a probabilistic prediction as a function of some independent variable. A general use case would be if I had an ML model in production (say logistic regression) and I am getting predictions say every hour. I want to model how the uncertainty in that prediction changes within the hour between predictions with a linear function.

Here is a stripped down, reproducible, example:

using StatsPlots, StatsBase, Distributions
using Turing, MCMCChains, Random

Random.seed!(101);

logit(x) = log(x/(1-x));
expit(x) = exp(x)/(1+exp(x));

function SimulateData(A, B, N)

    M = rand(Uniform(0,1),N) # Predicted Rates
    X = rand(Uniform(0,1),N) # Predictor Variable
    T = Int.(round.(rand(Normal(10,3),N))) # Number of Trials
    Y = []
    for i in 1:N
        v = expit(A + B * X[i])*M[i]*(1-M[i]);
        nu = M[i]*(1-M[i])/v - 1;
        a = M[i]*nu;
        b = (1-M[i])*nu;
        push!(Y, rand(BetaBinomial(T[i], a, b), 1)[1])
    end

    return (X, Y, M, T)
end

X, Y, M, T = SimulateData(-1, 1, 100);

# Define Model
@model function BetaBinom(X, Y, M, T)
    # number of obseravations
    n = length(Y);

    # Define Priors
    A ~ Normal(0,1);
    B ~ Normal(0,1);

    for i in 1:n
        # Beta Distribution
        v = expit(A + B * X[i])*M[i]*(1-M[i]);
        nu = M[i]*(1-M[i])/v - 1;
        a = M[i]*nu;
        b = (1-M[i])*nu;
        # Likelihood
        Y[i] ~ BetaBinomial(T[i], a, b)
    end
end


# Sample using NUTS.
m = BetaBinom(X, Y, M, T);
chain = sample(m, NUTS(), 1000);
summarize(chain)

However, executing this code yields the following error:

ERROR: DomainError with Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}}(0.0,-1.009028171378666e-16,-1.009028171378666e-16):
BetaBinomial: the condition α > zero(α) is not satisfied.

I am able to build this model in PyMC, so I have some confidence that the model is identifiable. I did some digging and it looks like this might be related to the following Distributions.jl issues: #1003, #1810. I wanted to raise this to see if my intuition is correct or if there is something else wrong with my code that I might be missing.

This might need to be:

M[i]*(1-M[i])/v + 1

with a plus, e.g. if the variance is maximal, v == M[i]*(1-M[i]) and so you want nu == 2 and not nu == 0. The former produces a uniform distribution on (0,1) in the case of M == 0.5.

Hi Dan,

Thanks for your reply. This is a good point about my model specification. I made the changes that you suggested and my model samples without issue.

Though I am still a bit unsatisfied that I cannot express my model directly as its defined here (following the mean/variance parameterization). That is where I got the nu = M[i]*(1-M[i])/v - 1; from. After further digging it does appear as though this issue has appeared in other Julia PPLs such as Gen.jl. I will raise the issue with the Turing.jl team and see if they have any ideas. Thanks again.

From the link:

I think this parametrization refers to the mean and variance of the Beta distribution, and not the mean and variance of the samples (which are taken from a BetaBinomial distribution with a parameter sampled from the Beta distribution). The mean and variance you want to control are of the samples. This may be the origin of the confusion. Though it might be my confusion here (as the convoluted wording of this paragraph evidences).

Hi, I posted that logGamma sampler you mentioned, and I really don’t know about your sampler, I do know that I constantly get errors from zero Betas.

rand(Beta) (and rand(Dirichlet)!) returns zeros and 1s. This is actually a bad feature of the lack of richness of floating point representions on [0,1] when you have power-law tails.

If you want well behaved outputs you have to change representations.
Note that if you copy the code I left in #1810 , for a logGamma sampler, you can do something along the following (I have done it many times!)
using LogExpFunctions
x = [rand_logGamma(a) for a in A]
then
d = log.(x) .- logsumexp(x)
is log-Dirichlet distributed. For 2 dimensional A = (α,β), this means d is a numerically stable version of
d == [log(p) , log(1-p)]

where p is Beta(α,β) distributed.

There’s really no substitute for having both \log(p) and \log(1-p) for representing this probability distribution and looking at the power laws on both ends.

In the end, it’s kind of unclear how to fix Distributions.jl, because there’s nothing wrong with their Beta, so much as its wrong to represent the samples from Beta with a single Float64. We just really need a log_gamma / log_dirichlet sampler.

Inside a BetaBinomial it shouldn’t really matter though, a zero is a zero, its a perfectly meaningful limit. Seems like a bug in BetaBinomial which should just return zeros(N) deterministically if α = 0. (unless β = 0 too, then it should error. That’s NOT a well defined limit.)

A few things:

  1. I don’t think the rand will be the issue here as this is only used for the initial sample, and then we just run NUTS. Does it run for a few iterations before it fails?
  2. Can you just add a small eps(T) to the alphas and betas to avoid the degeneracy?

But it does seem like we should just have a better alternative in Distributions.jl, as discussed here: Issue with support of Beta distribution when alpha < 1 or beta < 1 · Issue #1003 · JuliaStats/Distributions.jl · GitHub

1 Like

Thanks for your reply. It is able to find the initial step size, but fails promptly after that. It is possible that it is able to run a few iterations before failing but if it does than it is not many.

Adding eps() to the alpha and betas does allow my model to run and the model estimates parameters reasonably close to the true values I used to simulate the data. Thanks so much for your help.

Here is my updated code for the next person who comes across this issue:

using StatsPlots, StatsBase, Distributions
using Turing, MCMCChains, Random

Random.seed!(101);

logit(x) = log(x/(1-x));
expit(x) = exp(x)/(1+exp(x));

function SimulateData(A, B, N)

    M = rand(Uniform(0,1),N) # Predicted Rates
    X = rand(Uniform(0,1),N) # Predictor Variable
    T = Int.(round.(rand(Normal(10,3),N))) # Number of Trials
    Y = []
    for i in 1:N
        v = expit(A + B * X[i])*M[i]*(1-M[i]);
        nu = M[i]*(1-M[i])/v-1;
        a = M[i]*nu;
        b = (1-M[i])*nu;
        push!(Y, rand(BetaBinomial(T[i], a, b), 1)[1])
    end

    return (X, Y, M, T)
end

X, Y, M, T = SimulateData(-1, 1, 100);

# Define Model
@model function BetaBinom(X, Y, M, T)
    # number of obseravations
    n = length(Y);

    # Define Priors
    A ~ Normal(0,1);
    B ~ Normal(0,1);

    for i in 1:n
        # Beta Distribution
        v = expit(A + B * X[i])*M[i]*(1-M[i]);
        nu = M[i]*(1-M[i])/v - 1;
        a = M[i]*nu + eps(); # adding eps() so model samples
        b = (1-M[i])*nu + eps();
        # Likelihood
        Y[i] ~ BetaBinomial(T[i], a, b)
    end
end


# Sample using NUTS.
m = BetaBinom(X, Y, M, T);
chain = sample(m, NUTS(), 4000);
summarize(chain)