Using Interpolations

#1

I’m trying to convert some code from Python to Julia. The key hang up at the moment converting the following Python code:

#Uses Numpy’s interp method
def TestInt(x):
return np.interp(x,Data[:,0],Data[:,1])

This is simple linear interpolation and returns a function which will provide an interpolated value for any x value within the interpolation range. I tried the Dierckx package which works with the following code:

TestInt = Spline1D(Data[:,1],Data[:,2]);

Unfortunately this is much too slow. I’m assuming because it is a spline rather than linear. I’m trying to use the Interpolations.jl package, but can’t figure out how to call it. Any help would be very welcome.

#2
using Interpolations

itp = LinearInterpolation(data[:,1], data[:,2]) # create interpolation function
itp(x) # call function on input data (use itp.(x) to broadcast over input vector)
#3

Thanks so much. The documentation leaves a lot to be desired. Interestingly, I find Dierckx’s Spline1D to be much faster. Even then, my Julia code overall is much, much slower than the equivalent Python code. Why is still to be determined.

#4

you can follow the julia performance tips , if that’s not enough, you can post your code and we can help you figure out whats the bottleneck

#5

Can you post what code you ran? I think Interpolations is known to be faster than Dierckx.

Are you sure you’re not measuring compilation time? Did you run the timing twice? (see the docs )

#6

The code for Julia I tried is:

using Dierckx, LsqFit

fit1 = Spline1D(x,y,w=ones(length(x)), k=1, bc="nearest", s=0.0);

fit2 = LinearInterpolation(x, y,extrapolation_bc=Flat());

xs = range(-10, step=0.5, stop=600);

@time fit1(xs)  # 0.000071 seconds (6 allocations: 19.531 KiB)

@time fit2(xs)  #0.000388 seconds (5 allocations: 9.844 KiB)

I’m running in JupyterLab so I cut and pasted the timing numbers into the code above. Yes, I ran the code several times. The similar code in Python using numpy.interp executes in about 30 microseconds. I’m calling these fits from a curve fitting routine. Using Scipy.optimize’s leastsq I get a fit in about 40 msec. With Julia’s LsqFit’s curve_fit I get 0.5 seconds. Both of these routines claim to use Levenberg Marquardt.

#8

I get

using Interpolations, Dierckx, BenchmarkTools
xs = range(-10, step=0.5, stop=600);
x = range(-10, step = 5, stop = 600);
y = rand(length(x));
fit1 = Spline1D(x,y,w=ones(length(x)), k=1, bc="nearest", s=0.0);
fit2 = LinearInterpolation(x, y,extrapolation_bc=Flat());
julia> @btime fit1(xs)
  25.693 μs (2 allocations: 19.38 KiB)
1221-element Array{Float64,1}:
 0.1089228239991975 
 0.12199812070726075
 0.135073417415324  
 0.14814871412338726
 0.1612240108314505 
 0.17429930753951373
 0.18737460424757701
 0.20044990095564022
 0.21352519766370348
 0.2266004943717667 
 ⋮                  
 0.9211223546564197 
 0.9279619969482743 
 0.934801639240129  
 0.9416412815319836 
 0.9484809238238383 
 0.9553205661156929 
 0.9621602084075476 
 0.9689998506994022 
 0.9758394929912568 

julia> @btime fit2(xs)
  22.451 μs (1 allocation: 9.69 KiB)
1221-element Array{Float64,1}:
 0.1089228239991975 
 0.12199812070726077
 0.135073417415324  
 0.14814871412338726
 0.16122401083145046
 0.17429930753951373
 0.187374604247577  
 0.2004499009556402 
 0.21352519766370348
 0.2266004943717667 
 ⋮                  
 0.9211223546564198 
 0.9279619969482741 
 0.9348016392401293 
 0.9416412815319836 
 0.9484809238238379 
 0.955320566115693  
 0.9621602084075473 
 0.9689998506994025 
 0.9758394929912568 

so it looks like Interpolations is a bit faster than Dierckx for me. To be fair Interpolations is cheating a bit here because Dierckx isn’t using the fact that the grid is uniformly spaced. If we do

julia> x2 = collect(x)
123-element Array{Int64,1}:
 -10
  -5
   0
   5
  10
  15
  20
  25
  30
  35
   ⋮
 560
 565
 570
 575
 580
 585
 590
 595
 600

julia> fit1 = Spline1D(x2,y,w=ones(length(x)), k=1, bc="nearest", s=0.0);

julia> fit2 = LinearInterpolation(x2, y,extrapolation_bc=Flat());

julia> @btime fit1(xs);
  25.958 μs (2 allocations: 19.38 KiB)

julia> @btime fit2(xs);
  96.927 μs (1 allocation: 9.69 KiB)

then Dierckx wins. I’m not sure why this is. The performance shootout picture for Interpolations here https://github.com/JuliaMath/Interpolations.jl claims it’s better at this too.

1 Like
#9

Don’t use the “convenience” constructor LinearInterpolation unless you also want extrapolation.

using Interpolations, Dierckx, BenchmarkTools

# Generate random, non-gridded data.
# Ensure the data goes all the way to the ends of the interval, chosen as [-1000, 1000]
x = sort(rand(1000)*2000 .- 1000); x[1] = -1000; x[end] = 1000
y = rand(length(x))

# Random evaluation points
x2 = rand(1000)*2000 .- 1000

# Interpolants
itp = interpolate((x,), y, Gridded(Linear()))
fit1 = Spline1D(x,y,w=ones(length(x)), k=1, bc="nearest", s=0.0)

@btime $itp($x2);
@btime $fit1($x2);

Results:

  66.543 μs (16 allocations: 40.42 KiB)
  192.380 μs (1 allocation: 7.94 KiB)
6 Likes
#10

If that’s the case, the docs may need to be updated:

For extrapolation, i.e., when interpolation objects are evaluated in coordinates outside the range provided in constructors, the default option for a boundary condition is Throw so that they will return an error. Interested users can specify boundary conditions by providing an extra parameter for extrapolation_bc […]

1 Like
#11

All are welcome to contribute to the docs, which definitely need help!

However, there’s nothing incorrect about the statement above, it’s just that adding extrapolation is an extra step that slows performance.

1 Like
#12

It seems very strange to have a default extrapolation that degrades performance and produces no apparent change in behavior:

using Interpolations, BenchmarkTools

x = collect(LinRange(0, 1, 50))
y = @. x^2 + rand()/10

itp1 = LinearInterpolation(x, y)
itp2 = interpolate((x,), y, Gridded(Linear()))

itp1(-1) # ERROR: BoundsError: attempt to access 50-element extrapolate(interpolate((::Array{Float64,1},), ::Array{Float64,1}, Gridded(Linear())), Throw()) with element type Float64 at index [-1]
itp2(-1) # ERROR: BoundsError: attempt to access 50-element interpolate((::Array{Float64,1},), ::Array{Float64,1}, Gridded(Linear())) with element type Float64 at index [-1]

@btime itp1(p) setup=(p=rand(256)) # 19.038 μs (1 allocation: 2.13 KiB)
@btime itp2(p) setup=(p=rand(256)) # 13.092 μs (15 allocations: 17.03 KiB)

Worth opening an issue?

1 Like
#13

Thank you so much for the timely responses. Just to provide some context, I’m using Python to do some consulting work for a client. This is why I don’t share my code unmodified. Just for my own interest, I decided to recreate this code in Julia. Julia appeals to me as I have a lot of experience in MATLAB which I no longer use (too expensive now that I don’t have a company buying it).

I found the documentation for Interpolations too terse. For example your explanation with the code:

itp2 = interpolate((x,), y, Gridded(Linear()))

was helpful and I didn’t find it in the docs. I gather the (x,) transposes the x vector for use with 1D fitting.

That said, I sort of need the extrapolation since the model for the curve fitting looks like this:

pval(x,p) = p[1]*fitA(x.+p[3]).+p[2]*fitB(x.+p[4]);

Each fit function models experimentally obtained dye responses (fluorescence lifetimes). The actual data I’m using is much higher resolution than the example I posted. The data has both magnitude and time variance since it is real data. Since the fits can be shifted left or right, I get errors without extrapolation.

What is strange is that the Python code runs in about 40 msec and the equivalent Julia code runs in 500 msec. The problem is either in the interpolation functions, the model or the curve fitting function. Here is the two curve fitting functions:

Python (leastsq is from scipy.optimize):

fit = leastsq(residuals, p0, args=(y, x), maxfev=2000)

Julia (curve_fit is from LstSq):

fit = curve_fit(pval, x, y, p0)

Both functions claim to use Levenberg-Marquardt. It is possible the the Optim package has a better solver and I can try to use that, but to start, I was trying to write comparable code.

I’m finding profiling a bit difficult. In MATLAB the profiler ranks code by the total length of time it takes to execute. This makes it easy to find where to spend time optimizing. The Profile package in Julia lists which code runs the most, but not how long it takes.

#14

It is in the docs, but buried a bit. You can find a similar example here: http://juliamath.github.io/Interpolations.jl/latest/control/#Gridded-interpolation-1 .

The (x,) isn’t a transpose, it constructs a tuple of length 1. The same syntax generalizes to higher dimensional problems. A 3D grid could have (x, y, z) as that argument, which is a tuple of length 3.

#15

Lets just say for myself, the documentation and example are not nearly clear enough. Part of this may be due to the fact that I’m not a mathematician, but I do have two degrees from MIT so I’m not clueless. In any case, I see this note from scipy.optimize.leastsq.

“leastsq” is a wrapper around MINPACK’s lmdif and lmder algorithms.

It may be that this function is especially fast in Python in that it is running optimized and compiled code. It still begs the question what I might try in Julia to speed up things.

#16

Are you comparing the time it takes to fit a single model in Julia versus Python? If so, it’s not surprising that the Python version is significantly faster. Julia achieves its speed by lazily compiling functions (just) ahead of time. Thus if you’re only doing any operation once in a Julia process, measuring the time that operation takes is also measuring compilation time.

This is why people earlier in this thread were using @btime from BenchmarkTools rather than @time from Base. @btime makes sure that it’s measuring the operation sans compilation.

#17

But it does — the profiling counts displayed are proportional to runtime.

As indicated above, this could be the case, which makes it a excellent opportunity to improve them by making a pull request. You are now in a perfect position to do improve it for the next person.

#18

I’m not using @btime, but I am running the @time several times to account for compilation. More to the point, when I do a full run (which iterates over 500 responses) it takes my Python code about 10 seconds to run where as my Julia code runs over several minutes. I take it the actual Levenberg-Marquardt code used by SciPy is just a wrapper over some highly optimized MINPACK code. Either way, I don’t think Julia should be an order of magnitude slower. The speed of the interpolation code doesn’t explain all of the slowness.

#19

https://github.com/KristofferC/TimerOutputs.jl can be used to get a bit higher level view of where execution time is being spent.

1 Like
#20

If you can strip the proprietary bits from your code, folks on this forum are often more than happy to help with performance tweaking. There shouldn’t be a big enough algorithmic difference between Julia and Python to produce to an orders-of-magnitude slowdown; the likeliest culprit is type instability. Try @code_warntype or Traceur.jl to ferret those out.

2 Likes
#21

This might take a bit of time (since I need to do both Python and Julia to compare) and I’m pretty busy over the next few days. I’ll see what I can do.