[ANN] FastInterpolations.jl — Fast & zero-allocation ND interpolation

@hersle
Thanks for your interest!

The current cubic_interp API is a global formulation that guarantees C2 continuity, so it does not currently provide the kind of local C1 Hermite-style interface you described.

I do think this kind of local C1 scheme would be a useful feature, so I’ll add it to the higher-priority items on my development roadmap.

Since the relevant Hermite-style infrastructure is already in place internally, I expect it should be quite feasible to implement. My current thinking is that this would more likely become a separate API, rather than an extension of cubic_interp, to keep it clearly distinct from the existing C2 formulation. That said, nothing is fixed yet, and I still want to think through the most ergonomic interface.

I’ll try to work on it soon after finishing the issues I’m already working on.

If you want, it would also be great to open an issue on the GitHub page to discuss the details further and keep track of updates.

Thanks!

Maybe you will find this package CubicHermiteSpline.jl interesting. [It is developed as a side project for research in my group]

@hersle
A new version of FastInterpolations.jl (v0.4.8) has just been released, and it finally supports a C¹-continuous local cubic Hermite family!

Note that cubic_interp in this package is the global cubic spline, so for the local cubic Hermite (with a pre-defined slope dydx as you asked) you use a separate entry point hermite_interp.

In addition to the user-provided slope case, there are also pchip_interp, cardinal_interp, and akima_interp, which compute the slopes automatically by their own rules (monotone-preserving, Catmull-Rom with tension, Akima, respectively):

using FastInterpolations

# One-shot API
hermite_interp(x, y, dydx, 1.0)   # user-supplied slopes (dydx)

# Interpolant API
itp = hermite_interp(x, y, dydx)   # build a reusable interpolant
itp(1.0)                           # evaluate at x = 1.0

# auto-slope variants (no dydx needed)
pchip_interp(x, y, 1.0)           # monotone-preserving
cardinal_interp(x, y, 1.0)        # Catmull-Rom with tension
akima_interp(x, y, 1.0)           # Akima

They all work in both 1D and ND (except hermite_interp for ND, which will come in a later release). Not every advanced feature is fully covered yet, but most of the common usage already works, and I will keep improving it in the upcoming releases.

For more details, please see Local Cubic Hermite docs.

Thanks a lot! I am testing it now and it seems to work well.

I just stumbled across this package when going through some of my statistics toolchains. It looks amazing! I tend to use cubic interpolations for storing kernel density estimates and for non-parametric convolutions. One way to express a cubic spline is to store cubic polynomials in a vector and map them to intervals. Is there any way to extract the cubic polynomial coefficients from one of your interpolation objects? I use this form for a lot of operations (not just integrals and derivatives but also convolutions).

One question (probably mentioned in the documentation, but I just skimmed through it):

  • Does FastInterpolations.jl support cases of missing data? This is super important in many cases involving experimental data, e.g., thermodynamics, etc.

What would you want it to do for missing data? Return missing if interpolation is requested in that part of the array? Would using NaN be acceptable instead?

I am not quite sure what the response should be, but there are many thermodynamics books with property tables where data are clearly missing, e.g., relations between temperature, pressure, enthalpy, density, entropy, etc., and where it would be useful to be able to “interpolate” somehow to get numeric values.

So my questions are perhaps:

  1. If there are missing data in the case of multiple independent variables, will the interpolation code actually work?
  2. If the code builds an interpolant even with missing data in the data set, what should be the response if I ask for a numerical value in a region where the data were missing? Perhaps some sort of “extrapolation”?

I don’t think it currently supports missing, but I’m guessing that it should “work” (in the sense of returning NaN for the missing regions) if you represent missing data by NaN values.

Just tried it: missing is not supported, and if you have a single NaN value then it returns NaN everywhere. Seems like this could be improved?

Going in this direction leads more towards unstructured-grid interpolation (e.g. you could have missing data in an irregular region in the interior of the domain), which is a very different problem and requires entirely different algorithms. There are other packages for this.

And, of course, extrapolation (if the missing data is on the edge of the domain) is even more specialized, and generally requires you to have a model or some other priors.

Caveat: I don’t speak for the authors of FastInterpolations.jl

Perhaps I’m describing the case that, e.g., enthalpy and entropy are not available for the same set of p,T variables… Perhaps the solution is to build several interpolation surfaces. I should find a good example, but am out of my office for the time. I’ll think it over.

If you want to fill in the missing data, Impute gives a model for doing this. This might be one of the packages alluded to above.

I am also using interpolated thermodynamic data generated from Coolprop and REFPROP.

I also haven’t found a great solution to dealing with areas of the lookup table having a NaN value with any of the interpolation libraries. I am currently using DataInterpolationsND, and it also returns NaN when any value is in the table is NaN. This really shouldn’t be the case especially for linear interpolation, where you are looking up 2^N (N is table dim) table values around your query point- if they are all valid numbers then it should not return NaN.

My workaround is to simply fill the values in the table with the “nearest” valid value. Then, after the fact check that my solution is in a valid region. REFPROP returns NaN for properties if fluid in the solid phase. So if something should be a liquid, make sure press and temperature are not telling me it’s a solid.

I have looked into using Clayperon.jl but this level of fluid modeling is far too involved for what I want to do-- I am not a chemist or physicist. Even figuring out what model to use to get the same results as REFPROP seem quite involved.

Hello, welcome to this stop on your journey through the package :smile:

There is a dedicated API of coeffs for retrieving the coefficients of polynomials at a specific position.

x = 0:3
y = @. 3*x^3 + 2*x^2 + x # polynomial (x + 2x^2 + 3x^3)

# First, create an interpolant object
itp = cubic_interp(x, y)

# Return a coeffecient object found at x=1.5,
# so the polynomial between x=1 and x=2 nodes.
xq = 1.5
cff = coeffs(itp, 1.5)

# A tuple of polynomial coefficients in ascending order
cff.p # ≈(0.0, 1.0, 2.0, 3.0), meaning the polynomial `0 + x + 2x^2 + 3x^3`

# Evaluation of polynomial at specific position
xq = 1.5
cff(xq) # returns 16.125 = 0 + (1.5) + 2*(1.5)^2 + 3*(1.5)^3
evalpoly(xq, cff) # same result

# If you want to find all coeffs across the domain, 
# there is no dedicated API for this purpose yet, 
# but you can use the list comprehension.
julia> [coeffs(itp, xx) for xx in x[1:end-1]]
3-element Vector{CellPoly{4, Float64, Float64}}:
CellPoly{deg=3, Float64}(x ∈ [0.0, 1.0])
CellPoly{deg=3, Float64}(x ∈ [1.0, 2.0])
CellPoly{deg=3, Float64}(x ∈ [2.0, 3.0])

See more details:


Currently, this API only supports 1D interpolation and a specific query point.

It can be generalized to provide a more ergonomic API, and possibly to support ND interpolation.

For a clearer design, it would be very helpful if you could open an issue on GitHub and provide a minimal example of your use case and expected behavior.

@BLI @stevengj @Jake @mark.garnett

Thanks all for the valuable discussion. Let me clarify how FastInterpolations currently handles NaN and missing.

1. NaN is supported naturally

  • Global-solve methods (quadratic, cubic): since the solve couples nodes together, a single NaN spreads across the domain (the entire domain for cubic) and yields NaN. This is the reasonable consequence of a global solve.

  • Local methods (constant, linear, and local cubic Hermite (pchip/cardinal/akima)): evaluate correctly wherever possible, returning NaN only where the local stencil includes the NaN.

See an example with visualizations:

2. missing is not supported yet

A dataset containing missing has an element type like Union{Missing,Float64}, which currently breaks internal type constraints (the kernels assume a single concrete value type). I’ll look into whether we can propagate missing the same way we propagate NaN.

For now, you can replace missing with NaN before the interpolation:

x = 1:5
y = [1.0, missing, 3.0, 4.0, missing]

# Replace missing with NaN values
y[ismissing.(y)] .= NaN # This y is Vector{Union{{Missing, Float64}}}

# or allocate a new clean y (more type-stable)
y_new = coalesce.(y, NaN) # This y_new is Vector{Float64}

# Examples
itp = linear_interp(x, y)
itp(1.5) # returns NaN
itp(3.5) # returns 3.5
itp(4.5) # returns NaN

3. Recommendation for missing data

How to best fill missing data is fairly problem-specific, so for now I’d suggest pre-filling the gaps as your case requires (with a dedicated package or your own logic), or simply leaving them and using a local method (constant/linear/cardinal/pchip/akima) to interpolate.

For simple cases, automatically filling the gaps with an internal local interpolation (e.g. a linear_interp pass) might be feasible inside FastInterpolations; I’ll think about it a bit more!

Perfect! 1-d retrieval of the original polynomials is all I need, I generally don’t use this functionality for higher dimensional distributions (polynomial convolutions aren’t well defined for multivariate distributions). This package makes me want to pick up multivariate kernel density regression again!