Interpolations inaccuracy

While moving code from python, I noticed that linear interpolation in Julia is less accurate than the python numpy interp function, see MWE below.

The error may seem tiny, but the value is integrated and adds up over time. Other than that, I’m also curious, why julia is less accurate than the python numpy interp function here.

Any ideas what might cause this or how to fix it?

using Interpolations


#Interpolation function
xp = [0.0,3600.0]
yp = [25.7,26]

itp = LinearInterpolation(xp,yp)
result = itp(3240)

#Manual check (y = ax+b)
a = (26-25.7)/3600
b = 25.7
result_check = 3240a+b

int_error = abs(result-result_check)

println("The interpolation error, $int_error, is larger than eps, $(eps())")

in python:

np.interp(3240,[0, 3600],[25.7, 26])

First, I think you should rather write: the linear interpolation in numpy yields different errors compared to the Interpolations.jl package. Interpolations are neither part of the Julia, nor Python.

Second: given your example, I get:

The interpolation error, 3.552713678800501e-15, is larger than eps, 2.220446049250313e-16

This error is extremely tiny and I am wondering what Python (numpy) yields (have not tried it yet, I was too lazy).

Lastly:

The error may seem tiny, but the value is integrated and adds up over time.

The error is extremely tiny and I am wondering what you do, so that these errors really “add up” over time, meaning that you get an offset to be worried about. In this case, it is much more likely that you have some fluctuations around the exact value due to floating point arithmethics and even if you integrate a curve, these errors should fluctuate around the truth.

Do you have an actual use-case where you can show that this is really a problem?

Btw. interpolating between two points is also a bit far from an actual use-case :wink:

3 Likes

Floating point numbers don’t have uniform absolute precision; a sensible comparison would be against

julia> eps(result_check)
3.552713678800501e-15

which shows that Interpolations.jl differs by 1ulp. (You can also use nextfloat(result_check) to verify this.)

Another way you can do the same calculation:

julia> f = 3240/3600
0.9

julia> (1-f)*yp[1] + f*yp[2]
25.970000000000002

which is the Interpolations.jl result. Which one is correct? Well, fma likes the Interpolations result:

julia> fma(1-f, yp[1], fma(f, yp[2], 0.0))
25.970000000000002

but taking the bounds at face value prefers the numpy:

julia> f = BigFloat(3240)/3600
0.8999999999999999999999999999999999999999999999999999999999999999999999999999965

julia> Float64((1-f)*yp[1] + f*yp[2])
25.97
5 Likes

Please remember that round-off errors are bounded in relative accuracy. I.e. you must either compute a relative error and compare that to eps(), or compute an absolute error and compare that to eps(expected_result)

Everything is very fine here, because:

julia> itp(3240) == nextfloat(result_check)
true

which means that what you expected and what you actually got are only one ulp apart.


But more importantly, if result_check is computed with double precision, why would you expect it to be sufficiently accurate to measure the “error” of another result computed in double-precision arithmetic?

If you actually want to assess the accuracy of a double-precision result, you have to use something else (ideally guaranteed to be more accurate) in order to get a reference. For example, BigFloat:

julia> biga = (BigFloat(26)-BigFloat(25.7))/BigFloat(3600)
8.333333333333353070631548891671829753451877170138888888888888888888888888888838e-05

julia> big_check = 3240biga + b
25.96999999999999992894572642398998141288757324218749999999999999999999999999989

julia> abs(result-big_check)/big_check
9.576047651753371803439068281745712155163492344466877751845994513632762834561385e-17

julia> abs(result-result_check)/big_check
1.368006807393339e-16

So it appears that the result LinearInterpolation gives you is actually more accurate than what you considered to be a reference result!

EDIT: … and probably (didn’t check) more accurate too than numpy

8 Likes

No, it’s not.

Both Interpolations and numpy are within one ulp of the correct result. It’s not very useful to compare one anecdotal data point though, so let’s check more interpolation points.

using Interpolations, PyCall, Statistics

xp = [0.0,3600.0]
yp = [25.7,26]

itp = LinearInterpolation(xp,yp)
results_itp = itp.(0:3600)

np = pyimport("numpy")
results_np = np.interp(0:3600, [0, 3600], [25.7, 26])

results_big = (0:3600) .* ((big(26) - big(25.7)) / big(3600)) .+ big(25.7)

errors_itp = abs.((results_itp .- results_big) ./ results_big)
errors_np = abs.((results_np .- results_big) ./ results_big)
errors_ideal = abs.((Float64.(results_big) .- results_big) ./ results_big)

print_result(s, x) = println(s,
                             round(Float64(maximum(x) / eps(Float64)), digits=5), " ",
                             round(Float64(mean(x) / eps(Float64)), digits=5))

println("                max     mean")
print_result("Interpolations: ", errors_itp)
print_result("Numpy:          ", errors_np)
print_result("Ideal:          ", errors_ideal)

with the results

                max     mean
Interpolations: 0.91171 0.23339
Numpy:          0.31465 0.15472
Ideal:          0.31056 0.1547

We can see that numpy is clearly doing better in the sub-ulp area than Interpolations and is in fact very close to the best possible Float64 results.

I can see 3 possible explanations:

  1. Numpy was lucky with the sampled interpolation points (very unlikely).
  2. Numpy’s algorithm has been carefully designed to give the best possible accuracy.
  3. Numpy is actually computing in higher precision internally.

To get any further would require benchmarking the speed of the interpolations and/or inspecting the relevant source code, but I leave that to more curious people.

4 Likes

You do get the numpy answer if you do the division last. With

xp = [0.0,3600.0]
yp = [25.7,26]
x = 3240
f = x/(xp[2]-xp[1])

instead of

julia> (1-f)*yp[1] + f*yp[2]
25.970000000000002

do

julia> ((xp[2] - x)*yp[1] + (x - xp[1])*yp[2])/(xp[2] - xp[1])
25.97

If all of these were integers, though, you’d be vulnerable to overflow:

xp = [0, 3600]
yp = [typemax(Int), typemax(Int)]

julia> ((xp[2] - x)*yp[1] + (x - xp[1])*yp[2])/(xp[2] - xp[1])
-1.0

but the f version is not:

julia> f = x/(xp[2]-xp[1])
0.9

julia> (1-f)*yp[1] + f*yp[2]
9.223372036854776e18

numpy clearly converts things to Float64 first, so it prevents the overflow, but this also destroys the ability to do generic computation—for example, you can’t autodiff through a computation like that. So in Julia we would definitely not want to coerce to Float64 first.

On balance I suspect that despite the slightly larger inaccuracy the Interpolations.jl approach is actually the more useful of the two.

7 Likes

Thank you for the fast and detailed answers!

I must admit that I was too quick to blame the difference in interpolation errors for my problem. By now, I found the bug that caused my python code to produce a different result from my Julia code.

Thank you anyway for the interesting insights!