Difference between two interpolation packages when evaluating derivatives: Interpolations.jl and Dierckx.jl

Recently I had a chance to test two interpolation packages, Interpolations.jl and Dierckx.jl. I found a difference between the two packages in the way they treat the boundary points when evaluating derivatives. I think this is an important difference that the users need to be aware of. And yet, they are not clearly discussed in their documentations, so here I report it.

First off, here in the version information of the two packages:

(@v1.6) pkg> st Interpolations
      Status `~/.julia/environments/v1.6/Project.toml`
  [a98d9a8b] Interpolations v0.13.2

(@v1.6) pkg> st Dierckx
      Status `~/.julia/environments/v1.6/Project.toml`
  [39dd38d3] Dierckx v0.5.1

And here is the test code to interpolate the data points sampled from f(x) = x^3 in -1 ≤ x ≤ 1:

using Interpolations
using Dierckx

xmin, xmax = -1, 1
f(x) = x^3
xs = range(xmin, xmax, length=21)
ys = f.(xs)

# Interpolators
fi = CubicSplineInterpolation(xs, ys, extrapolation_bc=Flat())  # use Interpolations
fd = Spline1D(xs, ys, bc="nearest")  # use Dierckx

# First derivatives of interpolators
fi⁽¹⁾(x) = only(Interpolations.gradient(fi, x))
fd⁽¹⁾(x) = derivative(fd, x)

# Second derivatives of interpolators
fi⁽²⁾(x) = only(Interpolations.hessian(fi, x))
fd⁽²⁾(x) = derivative(fd, x, nu=2)

Note that the boundary conditions extrapolation_bc=Flat() for Interpolations.jl and bc="nearest" for Dierckx.jl are equivalent boundary conditions that extrapolate the estimate function with the values of the interpolator evaluated at the boundary points.

Then, I plot the interpolators, their first derivatives, and their second derivatives at the sampling points. I also plot the exact curve y = x³ from which the sampling data are sampled, and its first and second derivatives y = 3x² and y = 6x. Here is the result (x-coordinates of the dots in the plots correspond to the sampling points):

In the first panel of the figure, we observe that the interpolators produced by the two packages are visually indistinguishable and perfectly aligned with the exact curve y = x³. (This is expected, because cubic spline passes though knots.)

However, the two packages start deviating when evaluating derivatives. In the second panel of the figure, we observe that the two packages produce different first derivatives around the boundary points. Specifically, Dierckx.jl produces first derivatives perfectly aligned with the exact curve y = 3x², but Interpolations.jl doesn’t near the boundary points

The deviation between the two packages becomes very noticeable when evaluating second derivatives. In the third panel of the figure, Dierckx.jl still produces second derivatives perfectly aligned with the the exact curve y = 6x, but Interpolations.jl produces very different values around the boundary points of the sampling range.

Though I didn’t have time to take a close look at the code, it seems that Interpolations.jl takes the extrapolated values outside the sampling range into account when evaluating the derivatives. For example, in the third panel of the figure, the second derivatives produced by Interpolations.jl at the boundary points x = ±1 are exactly zero. This is probably because the second derivatives at the boundary points were evaluated with the values outside the sampling range, where the interpolator produces a constant value due to the boundary condition extrapolate_bc=Flat().

On the other hand, Dierckx.jl seems to use the interpolator without extrapolated points when evaluating derivatives, even at the boundary points. This means the derivatives at the boundary points are calculated with one-sided limit within the sampling range. I confirm that regardless of the values of bc used, Dierckx.jl maintains this behavior, i.e., the values of the derivatives within the sampling range, including the boundary points, does not change with the boundary condition.

Depending on the situation, it could be desirable or not to incorporate the extrapolated points outside the sampling range when evaluating derivatives at boundary points, so I don’t think one package is superior to the other. In any case, I think the users of these interpolation packages should know the difference before they evaluate derivatives.

8 Likes

You should probably make a docs PR then?

1 Like

As you noticed, it seems like Interpolations.jl not only imposes the spline to pass through the data points, but also the second derivative to be zero at the boundaries. This condition is known as “natural splines”, mentioned for instance in this Wikipedia article.

First note that, in spline interpolation, the B-spline knots are not necessarily located at the data points, and one is more or less free to choose the knot locations. The main constraint is that, if you want to fit N data points, then you need to choose N - 2 unique knot locations (in the case of cubic splines), which means that you can’t put a knot on each data point.

The way I understand it, Interpolations.jl solves this issue by adding two additional constraints (f''(x) = 0 on each boundary), which means that now you need N unique knot locations, that is, the same as the number of data points.

An alternative popular solution, known as the “not-a-knot” condition (also mentioned in the Wikipedia article), consists in simply not putting a knot on your second and on your second-to-last data points. This is equivalent to imposing the third derivative of your function to be continuous at those points. This seems to be the approach taken by Dierckx.jl. As a shameless plug, this is also the approach taken by my own BSplineKit.jl, which gives the exact same result as Dierckx.jl in your example.

In your example, you can check the difference between both packages by looking at the B-spline knots chosen for each interpolation object. Interpolations.jl chooses 21 unique knots (same as the number of data points, and at the same locations), while Dierckx.jl puts 19 unique locations, skipping the data points x = \pm 0.9:

julia> collect(Interpolations.knots(fi))
21-element Vector{Float64}:
 -1.0
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
  0.0
  0.1
  0.2
  0.3
  0.4
  0.5
  0.6
  0.7
  0.8
  0.9
  1.0

julia> unique(fd.t)
19-element Vector{Float64}:
 -1.0
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
  0.0
  0.1
  0.2
  0.3
  0.4
  0.5
  0.6
  0.7
  0.8
  1.0
8 Likes

NB:
using Dierckx the knots can be obtained with: get_knots(fd)