Hi everyone!
I’m a statistician and R user taking my first steps with Julia. I have extensive experience with R, and this summer I spent my free time reimplementing the R package gamlss
as a personal exercise. Julia has really caught my interest, and I’d like to learn the language by working on a project to reimplement gamlss
in Julia.
I’m just getting started, and one of the first things I’m trying to build is an interface that somehow mirrors R’s family
concept, but immediately generalizable to distributions with multiple parameters modeled through regression models.
The idea is to create a Family
struct that takes various arguments, including a series of functions like lpdf()
. This lpdf()
function needs to be differentiated with respect to the model parameters contained in an array par
. Currently, I’ve conceived par
in the most general terms possible (given my current knowledge) as an “array of arrays,” where each element contains a vector of dimension n
(where n is the sample size). This allows me to compute the log-likelihood for the i-th observation x_i at its corresponding parameter value \theta_i. It seemed natural to design an lpdf()
function within Family
that uses broadcasting .
to perform these operations.
The trickiest part, however, is figuring out how to optimally compute the gradient and Hessian of the log-likelihood function for the i-th observation. I need the gradient and Hessian at the observation level to later build the chain rule for computing gradients and Hessians with respect to the regression model parameters, but that’s a problem I’ll tackle later.
The solution I’ve arrived at uses a double for loop and Zygote.jl, as you can see in the code below. It takes the data x
, the parameter array par
(structured as described above), and the Family
object from which I extract the lpdf()
expression that I differentiate for each observation.
Since I believe the best way to learn is by doing, making mistakes, and being corrected by people who know more than you do, I was wondering if you could share your thoughts on this implementation. Is this the best way to approach this kind of implementation in Julia?
Thanks!
using SpecialFunctions
using Zygote
using BenchmarkTools
struct Family
family::String # name of family
vartype::String # "continuous" or "discrete"?
varlower # lower bound for variable
varupper # upper bound for variable
npar::Int # number of parameters
parnames::Vector{String} # names of parameters
pardescr::Vector{String} # description of parameters
parlower::Vector{Float64} # lower bounds of parameters
parupper::Vector{Float64} # upper bounds of parameters
pdf::Function # probability density (mass) function
lpdf::Function # log pdf/pmf
cdf::Function # cumulative distribution functionm
lcdf::Function # log cdf
qf::Function # quantile function
E::Function # Expected value
V::Function # Variance
A::Function # Asymmetry
K::Function # Kurtosis
end
gaussian1 = Family(
"gaussian1",
"continuous",
-Inf,
Inf,
2,
["μ", "σ²"],
["mean", "variance"],
[-Inf, 0.0],
[Inf, Inf],
(x, par) -> exp.(-0.5 .* log.(2π*par[2]) .- 0.5 .* (x .- par[1]).^2 ./ par[2]),
(x, par) -> -0.5 .* log.(2π*par[2]) .- 0.5 .* (x .- par[1]).^2 ./ par[2],
(x, par) -> 0.5 .* (1 .+ erf.((x .- par[1]) ./ sqrt.(2*par[2]))),
(x, par) -> log.(0.5 .* (1 .+ erf.((x .- par[1]) ./ sqrt.(2*par[2])))),
(prob, par) -> par[1] .+ sqrt.(2*par[2]) .* erfinv.(2*prob .- 1),
par -> par[1],
par -> par[2],
par -> 0,
par -> 3
)
function grad_loglik(family::Family, x, par)
n = size(x, 1)
k = size(par, 1)
g = zeros(n, k)
par_vec = zeros(k)
# Use Zygote to compute gradient
for i in 1:n
for j in 1:k
par_vec[j] = par[j][i]
end
g[i, :] = gradient(p -> family.lpdf(x[i], p), par_vec)[1]
end
return g
end
function hess_loglik(family::Family, x, par)
n = size(x, 1)
k = size(par, 1)
h = zeros(k, k, n)
par_vec = zeros(k)
# Use Zygote to compute hessian
for i in 1:n
for j in 1:k
par_vec[j] = par[j][i]
end
h[:, :, i] = hessian(p -> family.lpdf(x[i], p), par_vec)
end
return h
end
n = 2
x = [1.0, 2.0]
mu = x
s2 = abs.(x)
par = [mu, s2]
@btime grad_loglik(gaussian1, x, par)
@btime hess_loglik(gaussian1, x, par)