BetaBinomial: the condition α > zero(α) is not satisfied

yahrMason · January 21, 2024, 9:29pm

Hello,

I have some code where I am trying to model the uncertainty in an a probabilistic prediction as a function of some independent variable. A general use case would be if I had an ML model in production (say logistic regression) and I am getting predictions say every hour. I want to model how the uncertainty in that prediction changes within the hour between predictions with a linear function.

Here is a stripped down, reproducible, example:

using StatsPlots, StatsBase, Distributions
using Turing, MCMCChains, Random

Random.seed!(101);

logit(x) = log(x/(1-x));
expit(x) = exp(x)/(1+exp(x));

function SimulateData(A, B, N)

    M = rand(Uniform(0,1),N) # Predicted Rates
    X = rand(Uniform(0,1),N) # Predictor Variable
    T = Int.(round.(rand(Normal(10,3),N))) # Number of Trials
    Y = []
    for i in 1:N
        v = expit(A + B * X[i])*M[i]*(1-M[i]);
        nu = M[i]*(1-M[i])/v - 1;
        a = M[i]*nu;
        b = (1-M[i])*nu;
        push!(Y, rand(BetaBinomial(T[i], a, b), 1)[1])
    end

    return (X, Y, M, T)
end

X, Y, M, T = SimulateData(-1, 1, 100);

# Define Model
@model function BetaBinom(X, Y, M, T)
    # number of obseravations
    n = length(Y);

    # Define Priors
    A ~ Normal(0,1);
    B ~ Normal(0,1);

    for i in 1:n
        # Beta Distribution
        v = expit(A + B * X[i])*M[i]*(1-M[i]);
        nu = M[i]*(1-M[i])/v - 1;
        a = M[i]*nu;
        b = (1-M[i])*nu;
        # Likelihood
        Y[i] ~ BetaBinomial(T[i], a, b)
    end
end


# Sample using NUTS.
m = BetaBinom(X, Y, M, T);
chain = sample(m, NUTS(), 1000);
summarize(chain)

However, executing this code yields the following error:

ERROR: DomainError with Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}}(0.0,-1.009028171378666e-16,-1.009028171378666e-16):
BetaBinomial: the condition α > zero(α) is not satisfied.

I am able to build this model in PyMC, so I have some confidence that the model is identifiable. I did some digging and it looks like this might be related to the following Distributions.jl issues: #1003, #1810. I wanted to raise this to see if my intuition is correct or if there is something else wrong with my code that I might be missing.

Dan · January 22, 2024, 9:57am

This might need to be:

M[i]*(1-M[i])/v + 1

with a plus, e.g. if the variance is maximal, v == M[i]*(1-M[i]) and so you want nu == 2 and not nu == 0. The former produces a uniform distribution on (0,1) in the case of M == 0.5.

yahrMason · January 24, 2024, 4:07am

Hi Dan,

Thanks for your reply. This is a good point about my model specification. I made the changes that you suggested and my model samples without issue.

Though I am still a bit unsatisfied that I cannot express my model directly as its defined here (following the mean/variance parameterization). That is where I got the nu = M[i]*(1-M[i])/v - 1; from. After further digging it does appear as though this issue has appeared in other Julia PPLs such as Gen.jl. I will raise the issue with the Turing.jl team and see if they have any ideas. Thanks again.

Dan · January 24, 2024, 10:40am

From the link:

I think this parametrization refers to the mean and variance of the Beta distribution, and not the mean and variance of the samples (which are taken from a BetaBinomial distribution with a parameter sampled from the Beta distribution). The mean and variance you want to control are of the samples. This may be the origin of the confusion. Though it might be my confusion here (as the convoluted wording of this paragraph evidences).

cdawg · January 24, 2024, 3:37pm

Hi, I posted that logGamma sampler you mentioned, and I really don’t know about your sampler, I do know that I constantly get errors from zero Betas.

rand(Beta) (and rand(Dirichlet)!) returns zeros and 1s. This is actually a bad feature of the lack of richness of floating point representions on [0,1] when you have power-law tails.

If you want well behaved outputs you have to change representations.
Note that if you copy the code I left in #1810 , for a logGamma sampler, you can do something along the following (I have done it many times!)
using LogExpFunctions
x = [rand_logGamma(a) for a in A]
then
d = log.(x) .- logsumexp(x)
is log-Dirichlet distributed. For 2 dimensional A = (α,β), this means d is a numerically stable version of
d == [log(p) , log(1-p)]

where p is Beta(α,β) distributed.

There’s really no substitute for having both \log(p) and \log(1-p) for representing this probability distribution and looking at the power laws on both ends.

In the end, it’s kind of unclear how to fix Distributions.jl, because there’s nothing wrong with their Beta, so much as its wrong to represent the samples from Beta with a single Float64. We just really need a log_gamma / log_dirichlet sampler.

Inside a BetaBinomial it shouldn’t really matter though, a zero is a zero, its a perfectly meaningful limit. Seems like a bug in BetaBinomial which should just return zeros(N) deterministically if α = 0. (unless β = 0 too, then it should error. That’s NOT a well defined limit.)

torfjelde · January 25, 2024, 4:45pm

A few things:

I don’t think the rand will be the issue here as this is only used for the initial sample, and then we just run NUTS. Does it run for a few iterations before it fails?
Can you just add a small eps(T) to the alphas and betas to avoid the degeneracy?

But it does seem like we should just have a better alternative in Distributions.jl, as discussed here: Issue with support of Beta distribution when alpha < 1 or beta < 1 · Issue #1003 · JuliaStats/Distributions.jl · GitHub

yahrMason · January 26, 2024, 2:02am

Thanks for your reply. It is able to find the initial step size, but fails promptly after that. It is possible that it is able to run a few iterations before failing but if it does than it is not many.

Adding eps() to the alpha and betas does allow my model to run and the model estimates parameters reasonably close to the true values I used to simulate the data. Thanks so much for your help.

Here is my updated code for the next person who comes across this issue:

using StatsPlots, StatsBase, Distributions
using Turing, MCMCChains, Random

Random.seed!(101);

logit(x) = log(x/(1-x));
expit(x) = exp(x)/(1+exp(x));

function SimulateData(A, B, N)

    M = rand(Uniform(0,1),N) # Predicted Rates
    X = rand(Uniform(0,1),N) # Predictor Variable
    T = Int.(round.(rand(Normal(10,3),N))) # Number of Trials
    Y = []
    for i in 1:N
        v = expit(A + B * X[i])*M[i]*(1-M[i]);
        nu = M[i]*(1-M[i])/v-1;
        a = M[i]*nu;
        b = (1-M[i])*nu;
        push!(Y, rand(BetaBinomial(T[i], a, b), 1)[1])
    end

    return (X, Y, M, T)
end

X, Y, M, T = SimulateData(-1, 1, 100);

# Define Model
@model function BetaBinom(X, Y, M, T)
    # number of obseravations
    n = length(Y);

    # Define Priors
    A ~ Normal(0,1);
    B ~ Normal(0,1);

    for i in 1:n
        # Beta Distribution
        v = expit(A + B * X[i])*M[i]*(1-M[i]);
        nu = M[i]*(1-M[i])/v - 1;
        a = M[i]*nu + eps(); # adding eps() so model samples
        b = (1-M[i])*nu + eps();
        # Likelihood
        Y[i] ~ BetaBinomial(T[i], a, b)
    end
end


# Sample using NUTS.
m = BetaBinom(X, Y, M, T);
chain = sample(m, NUTS(), 4000);
summarize(chain)

Topic		Replies	Views
Help with Turing.jl: BetaBinomial ArgumentError: BetaBinomial: the condition n >= zero(n) && (α >= zero(α) && β >= zero(β)) is not satisfied Probabilistic programming	7	866	July 22, 2021
Turing.jl: Confusion about random variables and draws from distributions Probabilistic programming turing	3	1005	May 28, 2021
Help with understanding Turing's predict function Probabilistic programming question , turing	3	661	September 2, 2023
Condition Turing model on a variable created within the model Probabilistic programming turing	4	101	July 15, 2024
Parameter estimation problem in Turing Probabilistic programming turing	11	495	July 14, 2022

BetaBinomial: the condition α > zero(α) is not satisfied

Related topics