[ANN] Vecchia.jl: linear-cost differentiable approximations for Gaussian log-likelihoods

Hi all. I’ve just registered Vecchia.jl, which provides functionality for a very convenient class of approximations for the Gaussian log-likelihood (as of now, only mean-zero, although for some technical reasons this is a bit less of a limitation than you might think). My preferred reference for those who like to read papers is Stein/Chi/Welty 2004, but the idea can be described very simply. Any joint likelihood for a multivariate \textbf{z} can of course be decomposed into a product of conditional likelihoods as

\ell_{\theta}(\textbf{z}) = \ell_{\theta}(z_1) \prod_{j=2}^n \ell_{\theta}(z_j \; | \; \textbf{z}_{1:(j-1)}).

The point of Vecchia approximations is that this quantity can in some circumstances be very well-approximated with

\ell_{\theta}(\textbf{z}) \approx \ell_{\theta}(z_1) \prod_{j=2}^n \ell_{\theta}(z_j \; | \; \textbf{z}_{\sigma(j)}),

where \sigma(j) \subseteq [j-1] and is typically O(1) in size. In this setting and from this formulation, it is reasonably direct to see that you will evaluate O(n) many O(1)-sized conditional Gaussian likelihoods, giving a very attractive linear complexity in the data size. There are plenty of dials to turn, like taking “chunks” \textbf{z}_j instead of scalar measurements z_j, but for the sake of brevity I won’t discuss all that in the initial announcement post here.

As far as why this is sometimes a sensible thing to do, in the simplest case you could imagine a time series for \textbf{z}, and there is an obvious intuition that perhaps conditioning on the recent past (say, \sigma(j) = \{j-3, j-2, j-1\}) makes the the present and the far-past nearly independent. For those who would like a reference or more concrete discussion of when exactly that might be reasonably close to the truth, let me know and I can provide plenty of references or discussion here. But for the moment I won’t introduce any more technical or mathematical language unless somebody asks.

The package is pretty easy to use. The README gives a complete demo, but here is a slightly shortened demo:

using LinearAlgebra, StaticArrays, Vecchia

# VERY IMPORTANT FOR MULTITHREADING, since this is many small BLAS/LAPACK calls:
BLAS.set_num_threads(1)

# Covariance function, in this case Matern(v=3/2):
kfn(x,y,p) = p[1]*exp(-norm(x-y)/p[2])*(1.0+norm(x-y)/p[2])

# Locations for fake measurements, in this case 2048 of them, and fake data 
# (data NOT from the correction distribution, this is just a maximally simple demo):
const pts = [SVector{2, Float64}(randn(2)) for _ in 1:2048]
const dat = randn(length(pts))

# Create the VecchiaConfig:
const chunksize = 64
const num_conditioning_chunks = 3
const vecc = Vecchia.kdtreeconfig(dat, pts, chunksize, num_conditioning_chunks, kfn)

# Now you can evaluate the likelihood:
const sample_p = ones(2)
Vecchia.nll(vecc, sample_p)

Thanks to the beautiful Folds ecosystem of @tkf, this code thread-parallelizes very well. And as yet another bonus, it is fully AD-compatible, so you can using ForwardDiff.hessian and so on to perform higher-order optimization, which is a must for GP estimation in my opinion. The example files provide a full demonstration of directly using Ipopt to perform parameter estimation. I acknowledge that I’m an Ipopt cultist, but I really don’t think it gets any better than that.

Not demonstrated here but shown in the README, the package also provides functionality for the induced sparse approximation to the precision matrix. This depends even more on @tkf’s ecosystem and now also on @elrod’s LoopVectorization, but with all those fancy tools assembling the sparse approximation is only about 8x the time cost of the direct evaluation of the many small likelihoods. All things considered, I think that’s pretty cool.

Also, I’m releasing this on the MIT license instead of my preferred GPLv2 license that I’ve always used in the past, so if that was ever a deterrent to using code I’ve released, no need for that concern here. I’d really love to see more of the spatial statistics community adopt Julia, because I’m sick of writing R bindings so that R users can access all of the incredible functionality of Julia. So crossing my fingers that this helps…

Once the three day waiting period is up, you should be able to ]add Vecchia. If you try it and have any issues or comments or questions, please do reach out. This is my first time registering a package and trying to deal with things like compat, so please also tell me if I’ve done something stupid there.

19 Likes

This looks great, I’m excited to check it out! @ElOceanografo I think you’ll find this interesting as well.

3 Likes

Yes! This looks great @cgeoga, can’t wait to play around with it.

2 Likes

Excellent! I’ll be very curious to hear from you and/or @jkbest2 if you do use it.

I should also clarify to anybody reading this thread that adding support for mean functions would be just a few lines of code. Since all the derivatives are AD-based it would be sufficient to evaluate for data - meanfun.(locations) instead of just data. I primarily haven’t added it because it’s not obvious what the “best” interface for it is. But if you or anybody else would be more interested if it had mean function stuff built in, I can easily plop that in and add that to the demo.

1 Like

Hey all–just an update that I’ve released a new version that now also offers the option of LoopVectorization-powered SIMD for the likelihood function. Particularly if you’re going to want sequential execution instead of threaded, the SIMD easily buys you a factor of two or three speedup. As of now it’s not completely stable for autodiff, but even with it just falls back to @inbounds that doesn’t come at a speed cost in the sequential case (unsure how exactly about many-core systems). But in some applications people just want a fast likelihood (right?), and so maybe even in its slightly incomplete form this would be of interest.

The usage is pretty simple once you’ve created the VecchiaConfig object like in the above code example or README:

# re-write the kernel function to accept scalar components of the two spatial
# coordinates.
function kfn_scalar(x1, x2, y1, y2, p)
  nrm = sqrt((x1-y1)^2 + (x2-y2)^2)
  p[1]*exp(-nrm/p[2])*(1+nrm/p[2])
end

# change some internals so that looping over points is LV-approved.
const vecc_s = Vecchia.scalarize(vecc, kfn_scalar) 

# Enjoy the AVX:
Vecchia.nll(vecc_s, sample_p)

Huge thanks to @elrod for help with the @generated functions that make this extension just as dimension-agnostic as the standard implementation that uses SVectors for coordinates.

4 Likes