Hello,
I have some code where I am trying to model the uncertainty in an a probabilistic prediction as a function of some independent variable. A general use case would be if I had an ML model in production (say logistic regression) and I am getting predictions say every hour. I want to model how the uncertainty in that prediction changes within the hour between predictions with a linear function.
Here is a stripped down, reproducible, example:
using StatsPlots, StatsBase, Distributions
using Turing, MCMCChains, Random
Random.seed!(101);
logit(x) = log(x/(1-x));
expit(x) = exp(x)/(1+exp(x));
function SimulateData(A, B, N)
M = rand(Uniform(0,1),N) # Predicted Rates
X = rand(Uniform(0,1),N) # Predictor Variable
T = Int.(round.(rand(Normal(10,3),N))) # Number of Trials
Y = []
for i in 1:N
v = expit(A + B * X[i])*M[i]*(1-M[i]);
nu = M[i]*(1-M[i])/v - 1;
a = M[i]*nu;
b = (1-M[i])*nu;
push!(Y, rand(BetaBinomial(T[i], a, b), 1)[1])
end
return (X, Y, M, T)
end
X, Y, M, T = SimulateData(-1, 1, 100);
# Define Model
@model function BetaBinom(X, Y, M, T)
# number of obseravations
n = length(Y);
# Define Priors
A ~ Normal(0,1);
B ~ Normal(0,1);
for i in 1:n
# Beta Distribution
v = expit(A + B * X[i])*M[i]*(1-M[i]);
nu = M[i]*(1-M[i])/v - 1;
a = M[i]*nu;
b = (1-M[i])*nu;
# Likelihood
Y[i] ~ BetaBinomial(T[i], a, b)
end
end
# Sample using NUTS.
m = BetaBinom(X, Y, M, T);
chain = sample(m, NUTS(), 1000);
summarize(chain)
However, executing this code yields the following error:
ERROR: DomainError with Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}}(0.0,-1.009028171378666e-16,-1.009028171378666e-16):
BetaBinomial: the condition α > zero(α) is not satisfied.
I am able to build this model in PyMC, so I have some confidence that the model is identifiable. I did some digging and it looks like this might be related to the following Distributions.jl issues: #1003, #1810. I wanted to raise this to see if my intuition is correct or if there is something else wrong with my code that I might be missing.