Differentiation of incomplete Beta function

First, to give some context: my use case is equivalent to the following contrived example, in which I would like to define a (Turing.jl) probabilistic program that uses a discretized Beta mixture model.

@model themodel(X, K) = begin
    a ~ Exponential()
    b ~ Exponential()
    p = discretize(Beta(a, b), K)
    X ~ MixtureModel([Bernouilli(p[i]) for i=1:K])
end

function discretize(d, K)
    qstart = 1.0/2K
    qend = 1. - 1.0/2K
    xs = quantile.(d, qstart:(1/K):qend)
    xs *= mean(d)*K/sum(xs)  # rescale by factor mean(d)/mean(xs)
end

Now, this does not work, because currently the quantile function for the Beta distribution (which is the inverse of the incomplete Beta function, or beta_inc_inv in SpecialFunctions.jl) does not work with AD. (This brings me to a first question, why do many methods in SpecialFunctions only take Float64 type args?). I experimented (lazily) with adapting the arg types for the relevant SpecialFunctions methods and using AD to get my gradient of interest, but this gets too slow.

Then I’ve implemented an algorithm for the gradient of the CDF of the beta distribution (i.e. beta_inc) with respect to the a and b parameters (which I found here). This gives me the gradient of beta_inc(a, b, x) w.r.t. a and b fast. I would expect that this would enable faster computations of the beta_inc_inv gradient (which mainly calls beta_inc in a numerical inversion routine if I understand correctly).

However, now I’m not sure how to proceed (I am no analysis guru, and all the stuff in JuliaDiff is a bit daunting). It seems to me I should implement some diffrule for beta_inc? Say my function is beta_inc_grad(a,b,x) and returns a tuple with beta_inc(a,b,x) and its partial derivatives w.r.t. a and b, is

@define_diffrule SpecialFunctions.beta_inc(a, b, x) =
    :( beta_inc_grad($a, $b, $x)[2] ), :( beta_inc_grad($a, $b, $x)[3] ), :NaN

what I should implement to make this work with AD (it doesn’t seem to be enough to make beta_inc play with ForwardDiff)? (Also, would this call beta_inc_grad twice?)

2 Likes

Would you be willing to share your beta_inc_grad code?

Hi Benjamin, I haven’t followed up on this, but I put what I had implemented back when I asked this question here https://github.com/arzwa/IncBetaDer. The beta_inc_grad code is here, I hope it can be useful. I’m not at all a numerical algorithms guy, so it might not be very good or correct currently, although I do think I had some tests. (Needless to say, I haven’t prepared this code for release or so, so comments etc. should be read as notes to myself).

I’m not a numerical algo guy either, but I also need these derivatives (in my case because I want to use TDistribution for some Bayesian stuff in Turing, but it doesn’t work with AD because of the beta_inc).

1 Like