Using Interpolations

Eric_Shain · May 7, 2019, 9:26pm

Here is code for Julia and Python that uses the basic approach I’m trying. I’m attempting to use equivalent algorithms which is Levenberg-Marquardt for the curve fitting and linear interpolation for the input fits. Here is the code and results. As you can see, Python executes the curve fit in about 1.4 ms and Julia in about 30.8 ms. I’m quite new at Julia so please point out bad practices even if it doesn’t impact performance. Everything was run in JupyterLab with Python 3.7 and Julia 1.1

Julia

using LsqFit, Interpolations, BenchmarkTools
# Create sample data
x1 = range(0.0, step=0.0099, stop=6.);
y1 = 0.7*sin.(x1) .+ 0.1*rand(length(x1));
x2 = range(0.0, step=0.001, stop=6.);
y2 = 0.7*cos.(x2) .+ 0.07*rand(length(x2));
x3 = range(0.1, step=0.0015, stop=5.9);
y3 = 0.5*sin.(x3) .+ 0.3*cos.(x3) + 0.1*rand(length(x3));

# Create fits
fit1 = interpolate((x1,), y1, Gridded(Linear()));
fit2 = interpolate((x2,), y2, Gridded(Linear()));

# Create Model
model(x,p) = p[1]*fit1(x) .+ p[2]*fit2(x)

# Do curve fit
@btime rst = curve_fit(model, x3, y3, [1.0, 1.0]);
rst.param

30.766 ms (2128 allocations: 18.46 MiB)
2-element Array{Float64,1}:
 0.726988264770365  
 0.43123727421300484

Python

import numpy as np
from scipy.optimize import leastsq # Uses Levenberg Marquardt

# Create sample data
x1 = np.arange(0.0, 6.0, 0.0099)
y1 = 0.7*np.sin(x1) + 0.1*np.random.randn(len(x1))
x2 = np.arange(0.0, 6.0, 0.001)
y2 = 0.7*np.cos(x2) + 0.07*np.random.randn(len(x2))
x3 = np.arange(0.1, 5.9, 0.0015)
y3 = 0.5*np.sin(x3) + 0.3*np.cos(x3) + 0.1*np.random.randn(len(x3))

# Create fits
def fit1(x):
    return np.interp(x,x1,y1)
def fit2(x):
    return np.interp(x,x2,y2)

# Create Model
def model(x, p):
        return p[0]*fit1(x) + p[1]*fit2(x)
def residuals(p, y, x):
        err = y-model(x,p)
        return err
    
# Do curve fit
rst = leastsq(residuals, [1.0, 1.0], args=(y3, x3), maxfev=2000)
print(rst)

%timeit leastsq(residuals, [1.0, 1.0], args=(y3, x3), maxfev=2000)

(array([0.70528424, 0.41814   ]), 3)
1.38 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Elrod · May 7, 2019, 10:00pm

On this computer, running your Julia code…

julia> rst = @btime curve_fit($model, $x3, $y3, [1.0, 1.0]);
  22.094 ms (1837 allocations: 18.45 MiB)

julia> rst.param
2-element Array{Float64,1}:
 0.7124976352396952
 0.4280267262041546

Hiding first example to emphasize second:

Summary

Making a few changes:

julia> using StaticArrays, StaticOptim

julia> # ] add https://github.com/aaowens/StaticOptim.jl#master
       
       function scurvefit(fit1, fit2, x, y, params)
           fit1s = fit1.(x)
           fit2s = fit2.(x)
           soptimize(params, bto = StaticOptim.Order3()) do p
               obj = 0.0
               @inbounds @simd for i ∈ eachindex(x, y)
                   obj += (p[1]*fit1s[i] .+ p[2]*fit2s[i] - y[i])^2
               end
               obj
           end
       end
scurvefit (generic function with 2 methods)

julia> sres = @btime scurvefit($fit1, $fit2, $x3, $y3, @SVector [1.0, 1.0]);
  430.781 μs (7 allocations: 60.89 KiB)

julia> sres.minimizer

2-element SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
 0.7124976352397004 
 0.42802672620415283

That is about 50x faster than the original.

Least squares via Cholesky decomposition:

"""
Given 2 fits, it calculates a design matrix "X", and then solves the normal equations
X*beta = y
X'X*beta = X'y
beta = (X'X)^{-1}X'y
returning beta.
"""
julia> function curvefit2(fit1, fit2, x, y)
           fit1s = fit1.(x)
           fit2s = fit2.(x)
           # Let X'X = S
           # S is 2x2 and symmetric
           # Therefore, we only need to calculate 3 numbers.
           # BLAS is slow for such skinny matrices, so I use a for loop.
           S11, S12, S22 = 0.0, 0.0, 0.0
           # S11 means S[1,1], S12 = S[1,2], etc.
           # additionally, we need X'y
           # this is two numbers. Ditto on BLAS being slow, so we calculate it in the same loop.
           xty1 = 0.0
           xty2 = 0.0
           # @fastmath lets the compiler use fused multiply add instructions
           # @inbounds eliminates bounds checks. Removing these branches lets the compiler vectorize
           # the loop. That is, it can use simd instructions.
           # @simd also hints at the compiler to use these instructions.
           # `SIMD` stands for "Single Instruction Multiple Data".
           # Per CPU instruction, a modern computer can do 4-8 multiplications and additions
           # ie, up to 16 floating point operations with a single instruction (8 multiplications + 8 additions).
           # These macros let the compiler do that.
           @fastmath @inbounds @simd for i in eachindex(x)
               S11 += fit1s[i]*fit1s[i]
               S12 += fit1s[i]*fit2s[i]
               S22 += fit2s[i]*fit2s[i]
               xty1 += fit1s[i]*y[i]
               xty2 += fit2s[i]*y[i]
           end
           # Now I take the Cholesky decomposition of S, calculating the upper triangle.
           U11 = sqrt(S11)
           U12 = S12 / U11
           U22 = sqrt(S22 - U12^2)
           # U' * U = S
           # This gives us
           # U' * U * beta = X'y
           # triangular systems of equations are easy to solve, so below we solve them twice
           # we can do that in three steps.
           v = xty1 / U11
           # Now to get w for the first solve, we need
           # w = (xty2 - U12 * v) / U22
           # but then to get w for the next solve, we need w / U22
           # so we combine these into one step by squaring the denominator.
           w = (xty2 - U12 * v) / U22^2
           v = ( v -  U12 * w) / U11
           v, w
       end
curvefit2 (generic function with 2 methods)

julia> @btime curvefit2($fit1, $fit2, $x3, $y3)
  349.281 μs (4 allocations: 60.66 KiB)
(0.7129037092076219, 0.4294629282267536)

350 microseconds.

We probably cannot get much better than that; the fits alone take over 310 microseconds:

julia> @btime (fit1($x3),fit2($x3))
  313.138 μs (31 allocations: 261.63 KiB)

aaowens · May 7, 2019, 10:26pm

That is nice, but StaticOptim isn’t getting you much here. Regular Optim is almost as fast and is a more standard dependency.

using Optim
function ocurvefit(fit1, fit2, x, y, params)
           fit1s = fit1.(x)
           fit2s = fit2.(x)
          optimize(params, BFGS()) do p
               obj = 0.0
               @inbounds @simd for i ∈ eachindex(x, y)
                   obj += (p[1]*fit1s[i] .+ p[2]*fit2s[i] - y[i])^2
               end
               obj
           end
       end

julia> sres = @btime ocurvefit($fit1, $fit2, $x3, $y3,  [1.0, 1.0]);
  688.529 μs (126 allocations: 67.63 KiB)

Elrod · May 7, 2019, 10:30pm

Fair. I think it’d be much better to follow my second approach using StaticArrays.jl. StaticArrays is a fairly standard dependency, and if all you’re doing is least squares, you should probably just use the closed form solution.

stevengj · May 7, 2019, 11:16pm

Linear least squares problems can be solved very efficiently with just \ (via QR least squares); it’s pretty suboptimal to use a generic nonlinear optimization method. No need for StaticArrays either.

Elrod · May 7, 2019, 11:26pm

My service example, curvefit2, solved the least squares problem (normal equations) using a Cholesky decomposition.
It was simple to work the math out by hand, and an approach I’m more familiar with.

I suggested StaticArrays to allow for variable numbers of parameters while maintaining more or less the performance of my second example.
With only two parameters, BLAS was slow.

I did not try the QR decomposition.

Eric_Shain · May 7, 2019, 11:26pm

Thanks for the effort and responsiveness. There is a fair bit to comprehend here. Definitely some Julia syntax I’m unfamiliar with. I suppose it is good that fast performance is possible with Julia, but it seems a lot less straightforward than the Python code. Is this just a corner case where the Python function is particularly well optimized? I’m still confused by the poor performance of the LsqFilt implementation.

Elrod · May 7, 2019, 11:47pm

I believe curve_fit uses an iterative algorithm.
By far the slowest part of the code is interpolating.
My code (and @aaowens’) gained mostly by interpolating only once per fit, and not once per iteration of the iterative algorithm.

I (and others) can answer any specific questions you may have after looking through the documentation.

stillyslalom · May 8, 2019, 2:25am

The performance deficit between the straightforward Julia and Python implementations comes largely from the fact that Numpy offers a compiled, carefully-tuned interpolation routine with implicit multithreading. Julia’s native threading infrastructure is still young, and consequently threading isn’t yet baked into the package ecosystem.

tim.holy · May 8, 2019, 1:53pm

A number of performance problems have now been fixed and are available in Interpolations 0.12.

Just as Interpolations was “cheating” on your first example, Dierckx is “cheating” on your second. You can see this if you try

p = randperm(length(xs))
xsr = xs[p]
@btime $fit1($xs)
@btime $fit1($xsr)

This explains most of the remaining gap; Interpolations relies on searchsortedfirst to find the bracketing knots for the interpolation point, but if you traverse them in order you can find the next knot more efficiently. It would be good if someone implements this form of “cheating” for Interpolations, too.

For 1d gridded interpolation, almost all of the time is spent in searchsortedfirst. Someone who wants to make it this package faster in such cases might well take a careful look at this function and see if there are untapped performance opportunities. (As well as adding the “cheating” for sorted vector inputs.)

Tamas_Papp · May 8, 2019, 1:59pm

This is my experience for higher dimensions, too, when solving dynamic programming problems in economics using Interpolations.jl (interpolating a value function).

FWIW, in case the user has control over the grid, then using a uniform grid, represented by an AbstractRange, is almost always worth it (if the problem is heavily nonlinear, this should be combined with a variable transformation). searchsortedfirst is O(1) for this case.

tim.holy · May 8, 2019, 2:10pm

Yes indeed. However, in higher dimensions you can amortize some of this cost if your evaluation points are on a cartesian grid:

julia> knots = ([0;sort(rand(8));1], [0;sort(rand(8));1])
([0.0, 0.00193145, 0.240481, 0.38677, 0.396281, 0.462792, 0.875859, 0.943712, 0.960067, 1.0], [0.0, 0.092186, 0.200238, 0.528178, 0.554724, 0.617535, 0.775003, 0.946701, 0.991103, 1.0])

julia> A = rand(10, 10);

julia> itp = interpolate(knots, A, Gridded(Linear()));

julia> xs, ys = rand(100), rand(100);

julia> @btime $itp($xs, $ys);
  30.332 μs (12 allocations: 83.39 KiB)

julia> interp_each(itp, xs, ys) = [itp(x, y) for x in xs, y in ys]
interp_each (generic function with 1 method)

julia> @btime interp_each($itp, $xs, $ys);
  301.992 μs (6 allocations: 78.30 KiB)

In the first case, searchsortedfirst is a nearly-negligible fraction of the cost.

Tamas_Papp · May 8, 2019, 2:20pm

You are of course right; but in the application domain I mentioned above (value/policy iteration, discrete time) unfortunately you are effectively just evaluating single points.

I wonder if Interpolations.jl could have (already has?) a mechanism for caching the grid-bin lookup. Eg if I am interpolating V(x, y), and want to calculate

\arg\max_{y \in [a, b]} V(x, y) \quad \text{given} \quad x

then I could somehow do (mockup code)

xg = lookup(x_grid, x)
optimize(y -> V(xg, y), a, b)

where V is the interpolated object.

tim.holy · May 8, 2019, 2:27pm

I wonder if Interpolations.jl could have (already has?) a mechanism for caching the grid-bin lookup.

It does for scaled but not gridded: https://github.com/JuliaMath/Interpolations.jl/blob/d1ad2a1409ce6cea96cd304a540c485d2b393f5d/src/scaling/scaling.jl#L157-L226

Worth noting that in 2D, Interpolations is ~4x faster (on my machine) than Dierckx for both point-by-point and grid evaluation. And that for both there’s a 10x gap between point-by-point and grid. It’s quite remarkable what a difference the presence or absence of a couple of optimizations can have in this space.

Eric_Shain · May 8, 2019, 3:28pm

To aid in my comprehension, a few comments in the code would be helpful.

pfitzseb · May 8, 2019, 3:40pm

fit1 = interpolate((x1,), y1, Gridded(Linear()));
fit2 = interpolate((x2,), y2, Gridded(Linear()));

function myfit(fit1, fit2, x, y)
    fity1 = fit1(x)
    fity2 = fit2(x)
    
    A = [fity1 fity2]
    A\y
end

ps = @btime myfit(fit1, fit2, x3, y3); # ~440µs

fity1 = fit1(x3);
fity2 = fit2(x3);
model(x,p) = p[1]*fity1 .+ p[2]*fity2

rst = @btime curve_fit(model, x3, y3, [1.0, 1.0]); # ~800µs
rst.param ≈ ps # true

model2(x,p) = p[1]*fit1(x) .+ p[2]*fit2(x)
rst = @btime curve_fit(model2, x3, y3, [1.0, 1.0]); # ~20ms
rst.param ≈ ps # true

myfit in the code above is maybe a bit easier to understand and almost as fast as @Elrod’s code (which takes ~350µs on my machine).

Eric_Shain · May 8, 2019, 4:41pm

Now this is really great. I actually understand what’s going on too.

Elrod · May 8, 2019, 4:57pm

I added comments.
My code is actually doing the same thing, but I turned the matrix operations into for loops / wrote them out element by element.

I fixed a bug, so it should also get the same answer.

Eric_Shain · May 8, 2019, 4:59pm

I do have one question. What if you wanted to have a parameter adjust not only magnitude of the two fits but their position in time. Like this:

model(x,p) = p[1]*fit1(x.+p[3]).+p[2]*fit2(x.+p[4]);

This is my more complete problem. It is also why I originally wanted extrapolation. How would you handle this in myfit?

jlperla · May 8, 2019, 5:13pm

Currently the code is

LinearInterpolation(range::AbstractRange, vs::AbstractVector; extrapolation_bc = Throw()) =
    extrapolate(scale(interpolate(vs, BSpline(Linear())), range), extrapolation_bc)
LinearInterpolation(range::AbstractVector, vs::AbstractVector; extrapolation_bc = Throw()) =
    extrapolate(interpolate((range, ), vs, Gridded(Linear())), extrapolation_bc)

i.e. pass in a vector if irregular grid or a range otherwise.

To be honest, I think that the Throw() is probably a bad default if it has these performance characteristics. What if it was changed to something like

LinearInterpolation2(range::AbstractVector, vs::AbstractVector; extrapolation_bc = nothing) =
    extrapolation_bc === nothing ? interpolate((range, ), vs, Gridded(Linear())) : extrapolate(interpolate((range, ), vs, Gridded(Linear())), extrapolation_bc)

itp3 = LinearInterpolation2(x, y)
@btime $itp3($x2);

itp4 = LinearInterpolation2(x, y, extrapolation_bc = Throw())
@btime $itp4($x2);

If you agree to this in principle, we could prepare a PR that covers the cases.

Topic		Replies	Views
How to speed up the numerical integration with interpolation Performance question , interpolations , integral	7	1477	March 17, 2023
Help with poor LinearInterpolation performance General Usage	10	1182	March 26, 2019
Julia version of Matlab function interp1 General Usage question	4	7320	February 5, 2018
Interpolation between two dataframe columns Performance dataframes , interpolations	4	104	March 21, 2025
Help with interpolation General Usage	2	585	August 24, 2020

Using Interpolations

Related topics