[ANN] FastInterpolations.jl — Fast & zero-allocation ND interpolation

Hello everyone,

I’d like to share FastInterpolations.jl, a high-performance N-dimensional interpolation package I’ve been working on.

I originally developed this for plasma physics simulations where interpolation sits inside tight inner loops that could run millions of times per timestep. Performance was the primary goal, but I also wanted to eliminate allocations entirely on the hot path — even small ones add up and become noticeable at that scale. On top of that, I needed seamless support for non-uniform rectilinear grids in N-dimensions.

To address this, FastInterpolations.jl is built around four main design goals:

  • High-performance: Typically 2-10x faster for standard problems, with up to >100x speedup for multiple series sharing the same grid (SeriesInterpolant).

  • Zero allocation: No GC pressure in tight loops after warm-up.

  • ND non-uniform grids: Works on any rectilinear grid (uniform or non-uniform) with a clean API that mirrors 1D.

  • Physical boundary conditions: Explicitly set physical boundary conditions (e.g., Neumann, periodic) per endpoint.

Beyond the core performance, it includes several useful features, including exact analytic derivatives, integration, and seamless compatibility with Complex values and AD tools.

Performance

Here is a one-shot benchmark (construction + evaluation per call) for 1D cubic spline interpolation against other popular interpolation packages:

If you have workflows that bottleneck on interpolations, feel free to give it a try. Feedback and issues are always welcome!

33 Likes

Construction + evalution, butn ormally you construct once and evaluate millions of times. So what’s the evaluation timing?

2 Likes

Thank you very much for this! I tested it to solve a PDE using semi-Lagrangian method and yes, it is really fast !

Is it possible to use the interpolant optimization when they share the same initial grid but the query points are different for each series and gain some time ?

3 Likes

Thanks for the interest!

Good question — the benchmark plot shows construction + evaluation (one-shot), but it actually captures both regimes in a single chart:

  • Left side (few query points): construction cost dominates — this is where the gap is largest
  • Right side (many query points): evaluation cost dominates — the curves effectively show pure eval speed

When we isolate evaluation only (build interpolant once, benchmark eval), the benchmark result follows:

vs Speedup
Interpolations.jl ~2x
DataInterpolations.jl ~20x
Dierckx.jl ~20x

For some context on what motivated this package — “construct once, evaluate many times” is one common pattern, but not the only one. The scenario I originally had in mind was dynamic data that changes over time.

In many physics simulations, such as solving PDEs (e.g. diffusion equation) over time, the grid x is fixed but y (density, velocity, temperature, …) evolves every timestep — so you need to recompute the spline each step.

For this scenario, other packages must rebuild everything from scratch on every call. FastInterpolations provides a one-shot API (e.g., cubic_interp!) designed for exactly this — it reuses grid structure and pre-allocated buffers, only recomputing what actually changed, enabling Zero-allocation after warm-up:

x = range(0, 1, 100)               # grid (fixed)
xq = range(0.0, 10.0, 500)         # query points
out = similar(xq)                   # pre-allocated output

for t in 1:1000
    @. y = sin(x + 0.01t)          # y evolves each timestep
    cubic_interp!(out, x, y, xq)   # zero-allocation ✅
end

So the one-shot benchmark is quite relevant for these workloads, and that’s where the full speedup (construction + eval) really matters. Happy to share more details or specific benchmarks if helpful!

9 Likes

Great to hear it’s working well for you :grinning_face:!

To answer your question — yes, that scneario is exactly the kind of workload this package was designed for, and so is already optimized well.
When multiple interpolants share the same x-grid, each cubic_interp call reuses the grid structure internally (spacing, LU factorization, etc.), so you’re not repeating redundant computation:

out_density     = cubic_interp(x_grid, density,     xq_d)
out_temperature = cubic_interp(x_grid, temperature, xq_t)  # grid info already cached
out_velocity    = cubic_interp(x_grid, velocity,    xq_v)  # no redundant computation

If your series also share the same query points, there’s SeriesInterpolant which goes a step further — it batches all series into a single interpolant and evaluates them together using SIMD vectorization across the series dimension:

sitp = cubic_interp(x, [density, temperature, velocity])

sitp(0.5)          # => 3-element Vector: [denstiy, temperature, velocity] at x = 0.5
sitp(out, xq)      # => in-place batch evaluation (zero-allocation)

This matters most when the number of series is large — with tens or hundreds of series (common in multi-species kinetics, spectral methods, etc.), SeriesInterpolant can be 10× faster or more compared to looping over individual interpolants, since it amortizes the interval search and exploits memory layout for SIMD.

Currently SeriesInterpolant is available for 1D. A more ergonomic API and ND support are on the roadmap.

Also, for ODE-style time-stepping on non-uniform grids, the hint system can be quite useful — it lets you pass the last-known interval index to skip most of the search overhead.

I can’t post links yet as a new user, but the documentation has detailed guides and examples — searching for these keywords should get you there:

  • “Series Interpolant” — batching multiple y-series
  • “Hints” — minimizing search overhead for sequential queries
  • “Optimization” — using interpolants within Optim.jl workflows
5 Likes

Hopefully these were the links you were referring to

2 Likes

Thank you very much. Unfortunately, I am not proficient enough in Julia to understand how it works. Cache management seems like magic to me because it does not appear explicitly in the function arguments. Do you know of any references that could help me learn how to do this? I tried to go through the implementation, but it is difficult. Thank you.

I don’t have a specific good reference to point to, but the idea is pretty simple, so let me try to explain briefly.

To enable caching, you first create a lookup table (like a dictionary) in a global scope (so that it can persist for the lifetime of the program, and can be easily accessed from any function). Each entry is stored under a key that represents the input (or setup) you’re reusing. Before doing an expensive computation, you look up that key to see if the result is already cached. If it is, you reuse it; if not, you compute it once, store it under the key, and reuse it next time.

In our case, cubic spline interpolation requires solving a tridiagonal linear system (LU decomposition) that depends only on the x-grid, not on the y-values. So the x-grid serves as the “key” in this lookup table, and the precomputed LU factorization is the “value”. On the first call with a given grid, we compute and store the factorization. On subsequent calls with the same grid (even with different y-data), we just reuse the stored result and skip the factorization entirely.

The actual implementation is a bit more complicated than a plain global dictionary to keep overhead low and ensure thread-safety, but the core concept is exactly the same. The architecture docs go into a little bit more detail if you’re interested: Auto-Cache · FastInterpolations.jl

3 Likes

Very nice package, thank you!

Is there currently an easy way to specify the order of interpolation independently per-axis for the ND case (eg. linear interpolation in X and cubic interpolation in Y)? I don’t see an obvious way to do so in the current documentation. I can definitely hack such functionality in two steps, but just wondering if there is an easy constructor that escaped my notice.

Thanks for the suggestion, and for your interest in the package!
For reference, this seems to be the same request as issue #55 on GitHub repo.

At the moment, FastInterpolations.jl only supports using the same interpolation method across all axes in the ND case.
Mixed per-axis interpolation orders do seem technically feasible, and this is already on my roadmap.
As this feature seems like a fairly high-demand feature, so I’ll likely prioritize it.

If you get a chance, it would be greatly appreciated if you could take a look at that issue and share a bit more about your use case. In particular, I’d be curious whether the main motivation is performance, or whether you have some more specific requirement in mind.

If performance is the main concern, it may also be worth trying the current ND cubic_interp first. It should reproduce any linear polynomial exactly, though it may introduce some unwanted overshoot. If you have a minimal example or a target comparison in mind, that would be very helpful too.

Thanks!

Thank you for your prompt reply!

I don’t have much to add to Issue #55 other than that it is similar to my use case. Notably, I am maintaining a code-base where performance is strongly desired and your library would be a good replacement for interpolation, which is currently one of the larger bottlenecks. It has already been previously determined, based on physical discussions, that my pipeline should interpolate with different orders for different dimensions.

Anyway, I don’t mean to hijack this thread any further. Congratulations on the excellent package!

1 Like

Hi! I’ve just found this package because I was looking for a way of preallocating an interpolant object that was being changed in every time step of a simulation.

I’ve tested it and looks very promising! Congrats and thanks a lot for the hard work!

I do have a question, though. In API Selection Guide section of the documentation, it is suggested than when Y changes and you have 1-3 series, one should use the One-shot API. That is my use case, since I only have 1-2 series where the Y value changes. What I’ve done now is to add a field to my particle struct that is the interpolant object, e.g.,

mutable struct MyParticle{T, S}
    x::T
    y::T
    interp::S
    # More fields and types that are omitted
end

Values x are fixed, but y change in each time step. At the beginning, when initializing the struct, I create the interpolation object, let’s call it myparticle. Later I can compute an interpolated value with

# Imagine x = 2.4 now
myparticle.interp(2.4)

This works flawlessly, but according to the API Selection Guide, it corresponds to an Interpolant, which is recommended when Y does not change.

Is there any reason for not doing this? How should I implement the One-shot API in this case then? All of my calls are made inside functions, not on the global scope.

Thanks for the kind words and glad the package is working well for you!

Short answer: Replace myparticle.interp(2.4) with cubic_interp(myparticle.x, myparticle.y, 2.4) (or any other methods you’re using, like linear_interp(myparticle.x, myparticle.y, 2.4)) and drop the interp field from your struct. Details below.


Why the Interpolant Approach Doesn’t Work Here

When you create an interpolant via cubic_interp(x, y), the internal coefficients are precomputed from that specific x and y and frozen. Even if you later mutate y, the interpolant still returns values based on the original data from construction time.

# ⚠️ Gives stale results after y changes
itp = cubic_interp(x, y)      # coefficients baked from y at this moment
y .= new_values                # y changes...
itp(2.4)                       # still uses the OLD y

Use the One-shot API Instead

ptl = myparticle # This is just alias

# single query
val = cubic_interp(ptl.x, ptl.y, 2.4)

# multiple queries 
xq= [2.4, 3.0, 5.0]
vals = cubic_interp(ptl.x, ptl.y, xq) 

# advanced (in-place for zero-alloc)
vals = similar(xq) # pre-allocate memory
cubic_interp!(vals, ptl.x, ptl.y, xq) 

This is just a function call—no object to manage, no interp field needed in your struct.

  • Whatever x and y you pass in is what gets used, so there’s no stale-data risk.
  • If the same x grid is reused across calls, the package internally caches the grid-dependent work, so you don’t need to manage that yourself.
  • Works the same whether x or y changes between calls or not.

When to Use an Interpolant Object

Only when both x and y are static—fixed once and queried many times without ever changing.


Feel free to reply here if anything else comes up, or open a GitHub issue if you’d like it tracked. Thanks!

1 Like

If you would like me to add it to the listing under “other inteprolation packages” GitHub - JuliaMath/Interpolations.jl: Fast, continuous interpolation of discrete datasets in Julia · GitHub let me know.

2 Likes

That sounds great!
I was hoping my package might be included there someday :slight_smile:
I’d really appreciate it, thanks!

Thanks a lot for your answer! I’m going to open a GitHub issue and we continue the discussion there to not flood this thread.

I’ve created issue #80 on the repo because the described behaviour only applies to cubic_interp and not to the other methods.

1 Like