New version 0.11 is out! This version brings a practical implementation of the Bouncy Particle sampler.
The sampler is defined and has essentially three tuning parameters: how often and how much the search direction/momentum is randomised, and possible a root of the mass matrix (e.g. Cholesky)
using LinearAlgebra
using ZigZagBoomerang
M = I # mass matrix
BPS = BouncyParticle(∅, ∅, # ignored
10.0, # momentum refreshment rate
0.95, # momentum correlation ρ ∈ [0,1] / only gradually change momentum in refreshment/momentum update
0.0, # ignored
M # root of momentum precision/mass matrix, or identity
)
So now let’s say we want to sample a multivariate density target p(x)
.
We’ll need to build two functions: dneglogp
, ∇neglogp!
-
d1, d2 = dneglogp(_, x, v)
computes for f(t) = -log(p(x + t*v))
the first directional derivative d1 = f'(0)
of negative target log-likelihood p
at x
in direction v
and the corresponding second derivative d2 = f''(0)
.
-
∇neglogp!(y, _, x)
computes the gradient of negative target log-likelihood -log(p(x))
at x
, saving it in y
(e.g. using ForwardDiff.gradient!
).
If you have a Turing model model
, ./turing/lr.jl has a convenient function creating these two:
dneglogp, ∇neglogp! = make_derivatives_neglogp(model)
We need both! This is in fact very nice: computing a directional derivative is much cheaper than a gradient, and we are getting away with calling dneglop
90%-99% of the time!
And essentially we are doing line search: we move along direction v
, probing the slope of -log(p(x + t*v))
until we need to change direction. Only then we need to compute the gradient to find a new direction to explore.
From here calling the sampler is it’s straight forward. Let’s define initial state and
d = 25 # number of parameters
t0 = 0.0 # nominal starting time of sampler
x0 = zeros(d) # starting guess sampler
θ0 = randn(d) # starting search direction/momentum sampler
T = 200. # end time (similar to number of samples in MCMC)
c = 5.0 # initial guess for the curvature of the target
Thus, we can call the sampler, the signature is as follows:
trace, final, (acc, num), cs = @time pdmp(
dneglogp, # return first two directional derivatives of negative target log-likelihood in direction v
∇neglogp!, # return gradient of negative target log-likelihood
t0, x0, θ0, T, # initial state and duration
ZigZagBoomerang.LocalBound(c), # use Hessian information
BPS; # sampler
adapt=true, # adapt bound c
progress=true, # show progress bar
subsample=true # keep only samples at refreshment times
)
Voila, some samples in trace
(a vector of pairs t => x
) where x
is a sample and t
a nominal time that can be ignored when subsample=true
:
julia> typeof(last.(trace))
Vector{Vector{Float64}} (alias for Array{Array{Float64, 1}, 1})
Check out ZigZagBoomerang.jl/turing for an example using a Turing.jl model and a comparison with NUTS!