[ANN] NaturalNeighbours.jl: Natural neighbour interpolation and derivative generation

Hi all,

I’ve just submitted my package NaturalNeighbours.jl for registration. This package is useful for performing natural neighbour interpolation over (scattered) planar data sets, as well as for derivative generation. The docs hopefully define everything in sufficient detail. I summarise some key parts of the package below too.

The idea behind natural neighbour interpolation is simple. Given some data (\boldsymbol x_i, z_i), i=1,\ldots,m, where \boldsymbol x_i \in \boldsymbol X \subseteq \mathbb R^2, and z_i \in Z \subseteq \mathbb R, suppose we want to interpolate the data at some \boldsymbol x_0. The idea is to use spatial information to decide which points to use for estimating the function at \boldsymbol x_0, rather than having to think hard about features like the number of nearest neighbours to use or a cut-off radius common in other spatial interpolation methods. Spatial information is represented via the Voronoi tessellation of the data sites, constructed with my other package DelaunayTriangulation.jl, allowing for the definition of the natural neighbours to \boldsymbol x_0, denoted N_0, to be the set of points in \boldsymbol X whose Voronoi cell contains \boldsymbol x_0. Then, using some definition of local coordinates \boldsymbol \lambda depending on \boldsymbol x_0 and N_0, the interpolant we define is

f(\boldsymbol x_0) = \sum_{i \in N_0} \lambda_iz_i.

Thanks to certain properties of the Voronoi tessellation, this interpolant is continuous in the query point \boldsymbol x_0 and smooth (only having derivative discontinuities at the data sites). Moreover, since this is a local interpolant, it is extremely fast to evaluate – I can evaluate the interpolant at hundreds of thousands of data points in less than half a second on my machine, independent of the size of the original data set. More mathematical detail is available here, where I also define the available coordinates: Sibson’s, Sibson-1’s (differentiable at the data sites!), Triangle, Nearest, and Laplace. A detailed example demonstrating how we can use interpolation is given here (and another example is at the end of this post).

Methods for derivative generation are also available – generation is used rather than the term estimation since the goal of these methods is to reconstruct derivatives that lead to a satisfactory interpolant, rather than necessarily reconstructing the exact values of the underlying function (see this paper for more discussion on this point if you are interested). By defining certain weighted least squares problems, gradients and Hessians can be accurately computed at the provided data sites, and the interpolants we provide can also be differentiated. The mathematics behind this is given here, and a detailed example of differentiation is given here.

A quick demonstration of the package is below.

using NaturalNeighbours
using CairoMakie
using StableRNGs

## The data 
rng = StableRNG(123)
f = (x, y) -> sin(x * y) - cos(x - y) * exp(-(x - y)^2)
x = vec([(i - 1) / 9 for i in (1, 3, 4, 5, 8, 9, 10), j in (1, 2, 3, 5, 6, 7, 9, 10)])
y = vec([(j - 1) / 9 for i in (1, 3, 4, 5, 8, 9, 10), j in (1, 2, 3, 5, 6, 7, 9, 10)])
z = f.(x, y)

## The interpolant and grid 
itp = interpolate(x, y, z; derivatives=true)
xg = LinRange(0, 1, 100)
yg = LinRange(0, 1, 100)
_x = vec([x for x in xg, _ in yg])
_y = vec([y for _ in xg, y in yg])
exact = f.(_x, _y)

## Evaluate some interpolants 
sibson_vals = itp(_x, _y; method=Sibson())
triangle_vals = itp(_x, _y; method=Triangle())
laplace_vals = itp(_x, _y; method=Laplace())
sibson_1_vals = itp(_x, _y; method=Sibson(1))
nearest = itp(_x, _y; method=Nearest())

## Plot 
function plot_2d(fig, i, j, title, vals, xg, yg, x, y, show_scatter=true)
    ax = Axis(fig[i, j], xlabel="x", ylabel="y", width=600, height=600, title=title, titlealign=:left)
    contourf!(ax, xg, yg, reshape(vals, (length(xg), length(yg))), color=vals, colormap=:viridis, levels=-1:0.05:0, extendlow=:auto, extendhigh=:auto)
    show_scatter && scatter!(ax, x, y, color=:red, markersize=14)
function plot_3d(fig, i, j, title, vals, xg, yg)
    ax = Axis3(fig[i, j], xlabel="x", ylabel="y", width=600, height=600, title=title, titlealign=:left)
    surface!(ax, xg, yg, reshape(vals, (length(xg), length(yg))), color=vals, colormap=:viridis, levels=-1:0.05:0, extendlow=:auto, extendhigh=:auto)

all_vals = (sibson_vals, triangle_vals, laplace_vals, sibson_1_vals, nearest, exact)
titles = ("(a): Sibson", "(b): Triangle", "(c): Laplace", "(d): Sibson-1", "(e): Nearest", "(f): Exact")
fig = Figure(fontsize=36)
for (i, (vals, title)) in enumerate(zip(all_vals, titles))
    plot_2d(fig, 1, i, title, vals, xg, yg, x, y, !(vals === exact))
    plot_3d(fig, 2, i, " ", vals, xg, yg)

# could keep going and differentiating, etc...
# ∂ = differentiate(itp, 2) -- see the docs.

I also give an example of using interpolation for studying Switzerland’s elevation data here. A figure from this example is shown below.


Hi Friend! congrats on the new package! Just want to let you know it seems like there are some LaTeX formatting errors in your blog post!

1 Like

Thanks :slight_smile: I don’t see any errors. Maybe it’s a browser issue? Unless I’m just missing them

1 Like

Oops. Thanks!

I’ve now released v1.1.0 of NaturalNeighbours.jl, which introduces the Farin(1) and Hiyoshi(2) interpolants. These are two interpolants based on Bezier splines, with Farin(1) requiring gradient information and is globally C^1 (including the data sites); Hiyoshi(2) requires gradient and Hessian information and is globally C^2 (including the data sites). Derivative generation can be used to get this derivative information. Additionally, Farin(1) has quadratic precision and Hiyoshi(2) has cubic precision (meaning they reproduce quadratic and cubic polynomials, respectively), while Sibson(1) only reproduces spherical quadratics.

These two interpolants produce much higher quality interpolated surfaces than the methods that I already had (the closest being Sibson(1)), and Hiyoshi(2) is the only method I have that is C^2 at the data sites.

I’ve also written a detailed section in the docs that compares all the interpolants Comparison of Interpolation Methods · NaturalNeighbours.jl, examining errors locally in terms of height errors, gradient errors, Hessian errors, and differences in surface smoothness; global errors are also measured, summing local errors away from the convex hull; benchmarks are also given. An image from that section:


Wow! Exactly what I am looking for. Thanks for sharing it.

In my case, scattered points are obtained from a very expensive computation that also produces gradient information simultaneously. For the 1D case, I use a cubic Hermite spline to interpolate my data. I wonder whether Hiyoshi(2) can deliver similar results for 2D as CHS.

Edit: Hessian is not available in my case. Farin(1) should be compared to CHS instead I think.

1 Like

Glad it’s useful for you!

I imagine Farin(1) could give similar results to a CHS in your case. I’ve only used a CHS in 1D before, but since Farin(1) is built in a similar way (fixing parts of the control net using function values, and controlling the edges using directional derivatives - the remaining control points are then set to achieve quadratic precision which might be the main difference), the results should be pretty comparable.

Could be a good comparison to make if I had a 2D CHS handy.

1 Like

2D CHS seems to be called bicubic interpolation instead in the literature. I too don’t find any implementation of this method in Julia as well as in other languages.

1 Like

This is a very interesting package! I have not heard of natural neighbors before, so would you say that it is typically preferable over “traditional” methods of interpolation for geospatial datasets (like the Switzerland example)?

1 Like

Based on the examples I’ve applied this technique to, I have found it preferable - but like most things, it most likely depends on your application. I am really not sure why natural neighbour interpolation is not more popular - maybe the implementation details are just a bit difficult? It gives really good results in other comparison papers too (search e.g. "natural neighbor interpolation" AND "comparison".

The key things would probably be:

  • (+) Natural neighbour interpolants (NNI) are very inexpensive to compute and setup.
  • (+) NNIs are robust to irregular distributed datasets, in contrast to e.g. inverse distance weighted interpolants which can struggle in these scenarios (many point distributions in geospatial datasets are irregular distributed).
  • (+) Since the points used for evaluating the interpolant around a point is chosen automatically, geospatial features are automatically considered when evaluating an NNI, and this also means that no hyperparameters need to be thought about at all for using an NNI. This locality of the NNI is much preferable to the global nature of e.g. radial basis function interpolants which require the solution of a large linear system.
  • (+) NNIs have a continuous dependence on the positions of the data sites.(If you use e.g. Triangle() interpolation that is commonly used, small perturbations of a point can completely change the triangle around that point! NNIs don’t have this issue thanks to very nice properties of Voronoi tessellations.)
  • (-) NNIs are, in principle, generalisable to higher dimensions, but they get extremely complex to implement even in three dimensions, and even evaluating them can be slow.
  • (-) Hiyoshi(2), a globally smooth C^2 interpolant, can be pretty expensive to compute in some cases.
  • (-) NNIs do not extrapolate well beyond the convex hull of the data points, whereas other methods like radial basis functions naturally extrapolate. (There are methods for extrapolation of NNIs, but I’ve not implemented them or assessed them critically.)

Overall, especially for the second point above, I would prefer to use NNIs for geospatial modelling. If extrapolation really is important, I don’t think you want NNIs. If you need interpolation in \mathbb R^d, d \geq 3, then I don’t have the tools for that either. I’m sure there are advantages/disadvantages I’ve missed though.

Maybe (if I ever have more time…) I need to eventually write up a more rigorous example for a geospatial application, examining e.g. the impact of the distribution of a set of points on the interpolated surface, with comparisons to other commonly used interpolants.