# Continuous versions of Poisson, Binomial, NegativeBinomial

Dear All,

For one of my projects, I need to have continuous versions of the Poisson, Binomial and NegativeBinomial distributions, along the lines of Padellini and Rue. Before I start the tedious task of porting these over to a Distributions.jl-compatible API, has anyone else done this to save me the trouble?

As far as I understand it, these continuous analogs are formed by finding a continuous distribution whose CDF evaluates to the same as the CDF of the discrete version at all points in the discrete support. In that case, here are some basic implementations to get you started:

``````using StatsFuns, Distributions, ADTypes, FiniteDiff
import DifferentiationInterface as DI

Î»::T
end

_discretize(d::ContinuousPoisson) = Poisson(d.Î»)

Base.minimum(::ContinuousPoisson) = 0
Base.maximum(::ContinuousPoisson) = Inf

Distributions.cdf(d::ContinuousPoisson, x::Real) = StatsFuns.gammaccdf(x + 1, 1, d.Î»)

function Distributions.logcdf(d::ContinuousPoisson, x::Real)
return StatsFuns.gammalogccdf(x + 1, 1, d.Î»)
end

n::Int
p::T
end

ContinuousBinomial(n, p) = ContinuousBinomial(n, p, ADTypes.AutoFiniteDiff())

_discretize(d::ContinuousBinomial) = Binomial(d.n, d.p)

Base.minimum(::ContinuousBinomial) = 0
Base.maximum(::ContinuousBinomial) = d.n

function Distributions.cdf(d::ContinuousBinomial, x::Real)
return StatsFuns.betaccdf(min(x + 1, d.n), max(d.n - x, 0), d.p)
end

function Distributions.logcdf(d::ContinuousBinomial, x::Real)
return StatsFuns.betalogccdf(min(x + 1, d.n), max(d.n - x, 0), d.p)
end

r::Int
p::T
end

ContinuousNegativeBinomial(r, p) = ContinuousNegativeBinomial(r, p, ADTypes.AutoFiniteDiff())

_discretize(d::ContinuousNegativeBinomial) = NegativeBinomial(d.r, d.p)

Base.minimum(::ContinuousNegativeBinomial) = 0
Base.maximum(::ContinuousNegativeBinomial) = Inf

function Distributions.cdf(d::ContinuousNegativeBinomial, x::Real)
return StatsFuns.betacdf(d.r, x + 1, d.p)
end

function Distributions.logcdf(d::ContinuousNegativeBinomial, x::Real)
return StatsFuns.betalogcdf(d.r, x + 1, d.p)
end

# shared code
for T in [:ContinuousPoisson, :ContinuousBinomial, :ContinuousNegativeBinomial]
@eval begin
function Distributions.quantile(d::\$T, p::Real)
upper = Distributions.quantile(_discretize(d), p)
lower = upper - 1
# NOTE: quantile_bisect is an internal function
return Distributions.quantile_bisect(d, p, lower, upper)
end

function Distributions.pdf(d::\$T, x::Real)
end

function Distributions.logpdf(d::\$T, x::Real)
return log(DI.derivative(Base.Fix1(Distributions.logcdf, d), d.ad_type, x)) + logcdf(d, x)
end
end
end
``````

And some quick plots to check that things look good:

``````using StatsPlots
dists = [ContinuousPoisson(5), ContinuousBinomial(10, 0.3), ContinuousNegativeBinomial(5, 0.3)]
plts = []
for dc in dists
d = _discretize(dc)
p1 = plot(d; func=cdf, label="discrete", title="CDF")
plot!(p1, dc; func=cdf, label="continuous")
p2 = plot(d; label="discrete", title="PDF")
plot!(p2, dc; label="continuous")
append!(plts, [p1, p2])
end
plot(plts...; layout=(3, 2), dpi=300)
``````

Some major limitations are that the incomplete beta and gamma functions that are used in the CDFs are implemented in SpecialFunctions and restricted to `AbstractFloat` types, so they can only be differentiated by source-to-source ADs like Tapir or Enzyme. Iâ€™m not sure if either of these work on these functions at the moment, which is why these implementations use finite differences.

4 Likes

Thanks @sethaxen! I had to define `mode` for the distributions to get this working. It isnâ€™t quite matching the PDFs I expect - the support for x<0 should be 0. Hereâ€™s a comparison with the R library `cbinom` (which is based on this paper
continuous_distributions.jl (2.8 KB)
:

``````include("./continuous_distributions.jl")
using RCall
@rlibrary cbinom

dc = ContinuousBinomial(10, 0.3)
x = collect(0:0.1:15)
y = pdf(dc,x)
yr = R"cbinom::dcbinom(\$x, 10, 0.3, log=FALSE)"
yr = convert(Array,yr)

plot(x,y,color="blue",label="Julia ContinuousBinomial")
plot!(x,yr,color="red",label="R cbinom")
``````

Iâ€™ll go through the fine print - many thanks for the pointers to the AD aspects of defining distributions, this was something that I had no idea how to get started!

At first glance it looks like an off-by-one error?

Yes, thatâ€™s right; I was just getting my head around the parameterizations. The R package uses the Ilienko (2013) approach, while the solution @sethaxen gave used the approach of Padellini and Rue to interpolate the CDFs. The latter approach gives a lower domain of -1 rather than 0. Rather than re-write the distributions, I think Iâ€™ll just have a note to myself to remember that generating random numbers from these can lead to negative values. I had to fix the mode calculations too; the corrected version is attached.
continuous_distributions.jl (2.8 KB)

1 Like

Yeah so the issue is that if weâ€™re matching the CDFs at all points in the support of the discrete distribution, then we need to have nonzero probability mass at x \le 0. We can do that by adding negative numbers to the support or by adding a discrete â€śatomâ€ť at x=0 (e.g. how `Distributions.censored` works). But since that adds a discrete jump in the â€śdensityâ€ť, plots will look a little strange, and youâ€™d need to modify `logpdf` to only use AD when x>0. Shifting everything by 1 makes the CDFs no longer match, but in some cases makes the PDFs might match better.