First, to give some context: my use case is equivalent to the following contrived example, in which I would like to define a (Turing.jl
) probabilistic program that uses a discretized Beta mixture model.
@model themodel(X, K) = begin
a ~ Exponential()
b ~ Exponential()
p = discretize(Beta(a, b), K)
X ~ MixtureModel([Bernouilli(p[i]) for i=1:K])
end
function discretize(d, K)
qstart = 1.0/2K
qend = 1. - 1.0/2K
xs = quantile.(d, qstart:(1/K):qend)
xs *= mean(d)*K/sum(xs) # rescale by factor mean(d)/mean(xs)
end
Now, this does not work, because currently the quantile function for the Beta distribution (which is the inverse of the incomplete Beta function, or beta_inc_inv
in SpecialFunctions.jl
) does not work with AD. (This brings me to a first question, why do many methods in SpecialFunctions
only take Float64
type args?). I experimented (lazily) with adapting the arg types for the relevant SpecialFunctions
methods and using AD to get my gradient of interest, but this gets too slow.
Then I’ve implemented an algorithm for the gradient of the CDF of the beta distribution (i.e. beta_inc
) with respect to the a
and b
parameters (which I found here). This gives me the gradient of beta_inc(a, b, x)
w.r.t. a
and b
fast. I would expect that this would enable faster computations of the beta_inc_inv
gradient (which mainly calls beta_inc
in a numerical inversion routine if I understand correctly).
However, now I’m not sure how to proceed (I am no analysis guru, and all the stuff in JuliaDiff
is a bit daunting). It seems to me I should implement some diffrule for beta_inc
? Say my function is beta_inc_grad(a,b,x)
and returns a tuple with beta_inc(a,b,x)
and its partial derivatives w.r.t. a
and b
, is
@define_diffrule SpecialFunctions.beta_inc(a, b, x) =
:( beta_inc_grad($a, $b, $x)[2] ), :( beta_inc_grad($a, $b, $x)[3] ), :NaN
what I should implement to make this work with AD (it doesn’t seem to be enough to make beta_inc
play with ForwardDiff)? (Also, would this call beta_inc_grad
twice?)