First, to give some context: my use case is equivalent to the following contrived example, in which I would like to define a (
Turing.jl) probabilistic program that uses a discretized Beta mixture model.
@model themodel(X, K) = begin a ~ Exponential() b ~ Exponential() p = discretize(Beta(a, b), K) X ~ MixtureModel([Bernouilli(p[i]) for i=1:K]) end function discretize(d, K) qstart = 1.0/2K qend = 1. - 1.0/2K xs = quantile.(d, qstart:(1/K):qend) xs *= mean(d)*K/sum(xs) # rescale by factor mean(d)/mean(xs) end
Now, this does not work, because currently the quantile function for the Beta distribution (which is the inverse of the incomplete Beta function, or
SpecialFunctions.jl) does not work with AD. (This brings me to a first question, why do many methods in
SpecialFunctions only take
Float64 type args?). I experimented (lazily) with adapting the arg types for the relevant
SpecialFunctions methods and using AD to get my gradient of interest, but this gets too slow.
Then I’ve implemented an algorithm for the gradient of the CDF of the beta distribution (i.e.
beta_inc) with respect to the
b parameters (which I found here). This gives me the gradient of
beta_inc(a, b, x) w.r.t.
b fast. I would expect that this would enable faster computations of the
beta_inc_inv gradient (which mainly calls
beta_inc in a numerical inversion routine if I understand correctly).
However, now I’m not sure how to proceed (I am no analysis guru, and all the stuff in
JuliaDiff is a bit daunting). It seems to me I should implement some diffrule for
beta_inc? Say my function is
beta_inc_grad(a,b,x) and returns a tuple with
beta_inc(a,b,x) and its partial derivatives w.r.t.
@define_diffrule SpecialFunctions.beta_inc(a, b, x) = :( beta_inc_grad($a, $b, $x) ), :( beta_inc_grad($a, $b, $x) ), :NaN
what I should implement to make this work with AD (it doesn’t seem to be enough to make
beta_inc play with ForwardDiff)? (Also, would this call