[ANN] FastInterpolations.jl — Fast & zero-allocation ND interpolation

Hello everyone,

I’d like to share FastInterpolations.jl, a high-performance N-dimensional interpolation package I’ve been working on.

I originally developed this for plasma physics simulations where interpolation sits inside tight inner loops that could run millions of times per timestep. Performance was the primary goal, but I also wanted to eliminate allocations entirely on the hot path — even small ones add up and become noticeable at that scale. On top of that, I needed seamless support for non-uniform rectilinear grids in N-dimensions.

To address this, FastInterpolations.jl is built around four main design goals:

  • High-performance: Typically 2-10x faster for standard problems, with up to >100x speedup for multiple series sharing the same grid (SeriesInterpolant).

  • Zero allocation: No GC pressure in tight loops after warm-up.

  • ND non-uniform grids: Works on any rectilinear grid (uniform or non-uniform) with a clean API that mirrors 1D.

  • Physical boundary conditions: Explicitly set physical boundary conditions (e.g., Neumann, periodic) per endpoint.

Beyond the core performance, it includes several useful features, including exact analytic derivatives, integration, and seamless compatibility with Complex values and AD tools.

Performance

Here is a one-shot benchmark (construction + evaluation per call) for 1D cubic spline interpolation against other popular interpolation packages:

If you have workflows that bottleneck on interpolations, feel free to give it a try. Feedback and issues are always welcome!

26 Likes

Construction + evalution, butn ormally you construct once and evaluate millions of times. So what’s the evaluation timing?

1 Like

Thank you very much for this! I tested it to solve a PDE using semi-Lagrangian method and yes, it is really fast !

Is it possible to use the interpolant optimization when they share the same initial grid but the query points are different for each series and gain some time ?

3 Likes

Thanks for the interest!

Good question — the benchmark plot shows construction + evaluation (one-shot), but it actually captures both regimes in a single chart:

  • Left side (few query points): construction cost dominates — this is where the gap is largest
  • Right side (many query points): evaluation cost dominates — the curves effectively show pure eval speed

When we isolate evaluation only (build interpolant once, benchmark eval), the benchmark result follows:

vs Speedup
Interpolations.jl ~2x
DataInterpolations.jl ~20x
Dierckx.jl ~20x

For some context on what motivated this package — “construct once, evaluate many times” is one common pattern, but not the only one. The scenario I originally had in mind was dynamic data that changes over time.

In many physics simulations, such as solving PDEs (e.g. diffusion equation) over time, the grid x is fixed but y (density, velocity, temperature, …) evolves every timestep — so you need to recompute the spline each step.

For this scenario, other packages must rebuild everything from scratch on every call. FastInterpolations provides a one-shot API (e.g., cubic_interp!) designed for exactly this — it reuses grid structure and pre-allocated buffers, only recomputing what actually changed, enabling Zero-allocation after warm-up:

x = range(0, 1, 100)               # grid (fixed)
xq = range(0.0, 10.0, 500)         # query points
out = similar(xq)                   # pre-allocated output

for t in 1:1000
    @. y = sin(x + 0.01t)          # y evolves each timestep
    cubic_interp!(out, x, y, xq)   # zero-allocation ✅
end

So the one-shot benchmark is quite relevant for these workloads, and that’s where the full speedup (construction + eval) really matters. Happy to share more details or specific benchmarks if helpful!

7 Likes

Great to hear it’s working well for you :grinning_face:!

To answer your question — yes, that scneario is exactly the kind of workload this package was designed for, and so is already optimized well.
When multiple interpolants share the same x-grid, each cubic_interp call reuses the grid structure internally (spacing, LU factorization, etc.), so you’re not repeating redundant computation:

out_density     = cubic_interp(x_grid, density,     xq_d)
out_temperature = cubic_interp(x_grid, temperature, xq_t)  # grid info already cached
out_velocity    = cubic_interp(x_grid, velocity,    xq_v)  # no redundant computation

If your series also share the same query points, there’s SeriesInterpolant which goes a step further — it batches all series into a single interpolant and evaluates them together using SIMD vectorization across the series dimension:

sitp = cubic_interp(x, [density, temperature, velocity])

sitp(0.5)          # => 3-element Vector: [denstiy, temperature, velocity] at x = 0.5
sitp(out, xq)      # => in-place batch evaluation (zero-allocation)

This matters most when the number of series is large — with tens or hundreds of series (common in multi-species kinetics, spectral methods, etc.), SeriesInterpolant can be 10× faster or more compared to looping over individual interpolants, since it amortizes the interval search and exploits memory layout for SIMD.

Currently SeriesInterpolant is available for 1D. A more ergonomic API and ND support are on the roadmap.

Also, for ODE-style time-stepping on non-uniform grids, the hint system can be quite useful — it lets you pass the last-known interval index to skip most of the search overhead.

I can’t post links yet as a new user, but the documentation has detailed guides and examples — searching for these keywords should get you there:

  • “Series Interpolant” — batching multiple y-series
  • “Hints” — minimizing search overhead for sequential queries
  • “Optimization” — using interpolants within Optim.jl workflows
5 Likes

Hopefully these were the links you were referring to

2 Likes

Thank you very much. Unfortunately, I am not proficient enough in Julia to understand how it works. Cache management seems like magic to me because it does not appear explicitly in the function arguments. Do you know of any references that could help me learn how to do this? I tried to go through the implementation, but it is difficult. Thank you.

I don’t have a specific good reference to point to, but the idea is pretty simple, so let me try to explain briefly.

To enable caching, you first create a lookup table (like a dictionary) in a global scope (so that it can persist for the lifetime of the program, and can be easily accessed from any function). Each entry is stored under a key that represents the input (or setup) you’re reusing. Before doing an expensive computation, you look up that key to see if the result is already cached. If it is, you reuse it; if not, you compute it once, store it under the key, and reuse it next time.

In our case, cubic spline interpolation requires solving a tridiagonal linear system (LU decomposition) that depends only on the x-grid, not on the y-values. So the x-grid serves as the “key” in this lookup table, and the precomputed LU factorization is the “value”. On the first call with a given grid, we compute and store the factorization. On subsequent calls with the same grid (even with different y-data), we just reuse the stored result and skip the factorization entirely.

The actual implementation is a bit more complicated than a plain global dictionary to keep overhead low and ensure thread-safety, but the core concept is exactly the same. The architecture docs go into a little bit more detail if you’re interested: Auto-Cache · FastInterpolations.jl

2 Likes