Confusion on Creating Gridded Interpolations

Hi everyone,

I am have run into a problem with interpolating values across large arrays using Julia. I have read these Discourse discussions pertaining to my problem but have not been able to use them to create a solution:

Here is the information on what I have been attempting:

Problem Background

I have a grid with dimensions m x n where the grid is rectangular in shape but could be square. It is represented as a matrix constructed like grid = ones(m, n). From there, I have an array of observations that I would like to use as a basis to interpolate values across the grid. These observations can be thought of as observations = ([xs], [ys], [values]) where xs and ys represent lists of x and y points on the grid where each x and y pair have an associated value that should be used to interpolate values surrounding each x and y location for use in a 2D interpolation.

What I run into is that in my specific implementations, I have large arrays that cause memory errors. Further, I am curious if I am fundamentally misunderstanding something about interpolation or how Julia handles interpolations as my numerous efforts have failed. Here are my attempts at finding a solution for a specific implementation of my problem:

Attempts on a Specific Problem

In this case, what I was attempting is to define my grid like grid = ones(500, 500) with 31 observations of interest. My observations look like this:

julia> observations
Tuple{Array{Float64,1},Array{Float64,1},Array{Float32,1}}
([187.0, 313.0, 157.0, 343.0, 132.0, 368.0, 157.0, 343.0, 187.0, 313.0, 85.0, 415.0, 46.0, 454.0, 85.0, 415.0, 250.0, 250.0, 57.0, 443.0, 129.0, 371.0, 164.0, 336.0, 250.0, 250.0, 250.0, 180.0, 320.0, 118.0, 382.0],
[57.0, 57.0, 134.0, 134.0, 250.0, 250.0, 366.0, 366.0, 443.0, 443.0, 129.0, 129.0, 250.0, 250.0, 371.0, 371.0, 132.0, 368.0, 313.0, 313.0, 415.0, 415.0, 403.0, 403.0, 417.0, 454.0, 475.0, 466.0, 466.0, 433.0, 433.0], 
Float32[-19.995152, -46.38638, -28.329832, 81.12224, -67.002556, 21.798035, 32.538525, 38.022156, -14.682513, -56.849937, -20.680933, 95.41259, 22.357513, 36.404854, -37.628937, -3.4352758, -25.271269, 36.05853, 26.666668, -29.088951, -7.4670925, -47.769047, -29.178823, -43.958958, 29.638771, -10.448129, 5.71033, -25.612432, 37.268364, -22.657557, -2.0058606])

Attempt 1: Using ScatteredInterpolation.jl

My first thought was to used ScatteredInterpolation.jl since I am basically working with scatter data, at least I thought. Trying to read the sparse documentation provided by the package, my first thought was to convert my observations to an array of length, 250000, with every value that I don’t care about set to one. Further, I modified the xs and ys to contain each x value and y value possible resulting in 250000 ordered pairs.

When I tried to interpolate like this interpolate(MultiQuadratic(), [xs, ys], samples), I consistently got OutOfMemoryError() messages. I couldn’t figure out how to remedy this and moved on to other approaches.

Attempt 2: Using Interpolations.jl

I found Interpolations.jl’s documentation a bit more friendly and informative. Using the docs, my initial thoughts was to do the following:

julia> nodes = (values, )
(Any[22.357513f0, 26.666668f0, -20.680933f0, -37.628937f0, -22.657557f0, -7.4670925f0, -67.002556f0, -28.329832f0, 32.538525f0, -29.178823f0  …  -43.958958f0, 81.12224f0, 38.022156f0, 21.798035f0, -47.769047f0, -2.0058606f0, 95.41259f0, -3.4352758f0, -29.088951f0, 36.404854f0],)

julia> grid = ones(500, 500);

julia> using Interpolations

julia> itp = interpolate(nodes, grid, Gridded(Linear()))
ERROR: MethodError: no method matching interpolate(::Tuple{Array{Any,1}}, ::Array{Float64,2}, ::Gridded{Linear})

I consistently struggled with getting the right form it was expecting. When I did get my data into an acceptable form, I wound up with these errors:

julia> itp = interpolate(nodes, grid, Gridded(Linear()))
ERROR: DimensionMismatch("knot vectors must have the same axes as the corresponding dimension of the array")

Which I did not understand as I thought I had dimensions correctly matched. Further, what is a knot? From here, I got frustrated and turned to my friend who is an expert in MATLAB for help.

Attempt 3: Reimplementation of MATLAB’s interp2 Function

Thanks to this handy blog, I came across a Julia implementation of interp2! It built upon Interpolations.jl so I was able to understand what it was doing based on Attempt 2 and reading the docs on MATLAB’s interp2 function. I created this function:

function interp2(X, Y, V, Xq, Yq)
    knots = (X,Y)
    itp = interpolate(knots, V, Gridded(Linear()))
    itp[Xq, Yq]
end

I redefined xs and ys to only be 31 ordered pairs (each x and y were in their own array) and values as the only 31 values I cared about. I then invoked this reimplementation to try to find the interpolated value at 300, 300 and got similar errors from Attempt 1 and 2:

julia> interp2(xs, ys, values, 300, 300)

ERROR: MethodError: no method matching interpolate(::Tuple{Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Gridded{Linear})
Closest candidates are:
  interpolate(::Tuple{Vararg{Union{AbstractArray{T,1}, Tuple} where T,N}}, ::AbstractArray{Tel,N}, ::IT) where {Tel, N, IT<:Union{NoInterp, Tuple{Vararg{Union{NoInterp, Gridded},N} where N}, Gridded}} at /home/cedarprince/.julia/packages/Interpolations/TBuvH/src/gridded/gridded.jl:83
  interpolate(::Type{TWeights}, ::Type{TC}, ::Any, ::IT) where {TWeights, TC, IT<:Union{NoInterp, Tuple{Vararg{Union{NoInterp, BSpline},N} where N}, BSpline}} at /home/cedarprince/.julia/packages/Interpolations/TBuvH/src/b-splines/b-splines.jl:159
  interpolate(::AbstractArray{var"#s67",1} where var"#s67"<:Number, ::AbstractArray{TEl,1}, ::TInterpolationType) where {TEl, TInterpolationType<:Interpolations.MonotonicInterpolationType} at /home/cedarprince/.julia/packages/Interpolations/TBuvH/src/monotonic/monotonic.jl:177
  ...
Stacktrace:
 [1] interp2(::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}, ::Int64, ::Int64) at ./REPL[95]:3
 [2] top-level scope at REPL[122]:1 

This is so confusing to me because based on the MATLAB docs, I was fully expecting this to work out of all my attempts.

Conclusion

As I have not been able to solve this problem, I have therefore come to the following conclusions:

  1. I am missing something fundamental about how Julia handles interpolation - both for gridded and non-gridded situations.
  2. I somewhere have a fundamental misunderstanding about interpolations.
  3. I discovered bugs that I have been running into consistently.
  4. Either I lack intuition about how interpolations work or there is something strange about how Julia does it as I come from Python and MATLAB.

Would anyone be able to please help me? I have been struggling all day and would greatly appreciate the help. Also, shoutout to the interpolations folks in the Julia community! I am sure these are great tools - I am just struggling with them!

Take care and have a great rest of the day.

~ TCP :deciduous_tree:

P.S. If more information is needed, please let me know and I would be happy to include code snippets, data, or what you may need.

2 Likes

So, just to clarify the wording here; there is no “how interpolations work in julia”. It’s how they work in whatever package you happen to try here.
And secondly, you do not have a grid. It seems you would like to construct such a grid by interpolating from a set of scattered observations, but this is the output, not the input.

So, I would suggest you start by interpolate the value at a single coordinate. When you have this, you can then proceed to do so 25000 times to fill up your desired grid.

  1. So as far as I can tell, you want exactly ScatteredInterpolations.jl
  2. Construct the interpolation object using only observations in the format Home · ScatteredInterpolation.jl At this point, you have only used the 2x31 known data points.
  3. Finally, constructing the output, which in your case you seem to want a grid. So then evaluate this interpolation object however you want. You want a grid? well first decide on your evaluated_xs and evaluated_ys (presumably n and m in length) and then just compute your interpolation for the n*m points.
using ScatteredInterpolation

xs = [187.0, 313.0, 157.0, 343.0, 132.0, 368.0, 157.0, 343.0, 187.0, 313.0, 85.0, 415.0, 46.0, 454.0, 85.0, 415.0, 250.0, 250.0, 57.0, 443.0, 129.0, 371.0, 164.0, 336.0, 250.0, 250.0, 250.0, 180.0, 320.0, 118.0, 382.0]
ys = [57.0, 57.0, 134.0, 134.0, 250.0, 250.0, 366.0, 366.0, 443.0, 443.0, 129.0, 129.0, 250.0, 250.0, 371.0, 371.0, 132.0, 368.0, 313.0, 313.0, 415.0, 415.0, 403.0, 403.0, 417.0, 454.0, 475.0, 466.0, 466.0, 433.0, 433.0] 
samples = [-19.995152, -46.38638, -28.329832, 81.12224, -67.002556, 21.798035, 32.538525, 38.022156, -14.682513, -56.849937, -20.680933, 95.41259, 22.357513, 36.404854, -37.628937, -3.4352758, -25.271269, 36.05853, 26.666668, -29.088951, -7.4670925, -47.769047, -29.178823, -43.958958, 29.638771, -10.448129, 5.71033, -25.612432, 37.268364, -22.657557, -2.0058606]

points = hcat(xs, ys)'

itp = interpolate(Multiquadratic(), points, samples)
interpolated = evaluate(itp, [186.0, 60.0]) # evaluate it for how ever many coordinates that you want
5 Likes

If it’s ocean stiff you’re interested in, you should also have a look at DIVAnd.jl, too. There’s even a cool little demo to play with if you want here (just chose a correlation length and click on 10 locations to see how ell you do).

1 Like

Hi @Mikael_Ohman,

Thanks for the great answer! I’ve marked it as the solution for what I was looking for. I have to chuckle as I came close to the proper implementation a few times, but struggled in part due to the docs.

I wish this post to be a somewhat good “catch-all” post for folks who have questions about making interpolations in Julia. Was there anything else I could improve upon with my post to make it more clear for people to understand?

Hey @briochemc - not looking at Ocean-related data but that is cool! Loved the demo. Thanks for the link! :smile:

No worries! Note that, as they say on the ReadMe,

The method was initially developped with ocean data in mind, but it can be applied to any field where localized observations have to be used to produce gridded fields which are “smooth”.

1 Like

It is unclear what structure your grid has (regular or scattered). If scattered, you need a really large basis to interpolate 250k points — unless you are sure you need that many degrees of freedom, perhaps you are better served with eg a least squares approximation using a relatively low-rank basis.

Let me add that these issues have been carefully addressed in GeoStats.jl, and that you can not only do smooth interpolations, but also simulate spatial processes in a grid without effort.

Notice that most interpolation packages assume that you can store the full grid of coordinates into memory, whereas in GeoStats.jl we never materialize the grid:

julia> using GeoStats
julia> @allocated RegularGrid(100000, 100000, 100000)
0

This is crucial in 3D interpolation for example where a simple 500x500x500 grid with x,y,z coordinates requires 1GB of memory. Because we are also modeling the subsurface with the package, you can always expect that our efforts will scale well.

Check the GeoStatsTutorials and let us know if a feature or method is missing.

7 Likes

As I mentioned in the first link you posted, ScatteredInterpolation.jl can work with a bunch of limitations. Comparing to what exists in Python and MATLAB, there is no simple linear or nearest-neighbor scattered interpolation, and because of the underlying function calls the Distance package, for large grid you will fall into out of memory error. It would be really nice to have such functionalities, and maybe me or someone interested should contribute to that package.

@henry2004y the out of memory error is not caused by the dependency on Distances.jl it is about how it is used internally in the package. We never construct full pairwise distance matrices in GeoStats.jl exactly because of that.

1 Like

@TheCedarPrince I highly recommend learning more about geostatistics if you are doing work related to geospatial data. All these interpolation, simulation, etc problems are addressed properly by the theory, and more contributors are always welcome.

5 Likes

Thanks for pointing out this package again! I think the reason I end up using the Python call original approach is simply that I was worried about the dependencies that come with the GeoStats.jl package. The name itself suggests that it may contain many things that is irrelevant to my purpose. It would be wonderful if these basic functionalities exist in a more lower level package like ScatteredInterpolation.jl.

@henry2004y you mean the submodules of the package like GeoEstimation.jl, KrigingEstimators.jl etc? The package is modular if you read the source code of GeoStats.jl you will see it just reloads submodules in other repos.

But I agree with you it is a big project, and that dependencies can be an issue when someone is only interested in simple interpolations without all the GIS complications.

The main advantage of sticking with GeoStats.jl is when you have actual maps with different coordinates, different spatial objects playing with each other, etc. And you want to replicate spatial processes with some given spatial structure.

Actually I just realize that it comes as submodules :sweat_smile: However, the package name itself is still not informative enough for me. As someone who is looking for a proper solution for plotting, it is really unintuitive to think at first that the desired function lies underneath the GeoStats.jl package.

I get it. And if the goal is just plotting, then as I said, GeoStats.jl too big of a hammer.

I attempted to get this example to work for my project (data in csv format and loaded into DataFrames). The working MWE from above example and sample plots are below.

#https://discourse.julialang.org/t/confusion-on-creating-gridded-interpolations/50205/15

using ScatteredInterpolation
using GLMakie

xs = [187.0, 313.0, 157.0, 343.0, 132.0, 368.0, 157.0, 343.0, 187.0, 313.0, 85.0, 415.0, 46.0, 454.0, 85.0, 415.0, 250.0, 250.0, 57.0, 443.0, 129.0, 371.0, 164.0, 336.0, 250.0, 250.0, 250.0, 180.0, 320.0, 118.0, 382.0]
ys = [57.0, 57.0, 134.0, 134.0, 250.0, 250.0, 366.0, 366.0, 443.0, 443.0, 129.0, 129.0, 250.0, 250.0, 371.0, 371.0, 132.0, 368.0, 313.0, 313.0, 415.0, 415.0, 403.0, 403.0, 417.0, 454.0, 475.0, 466.0, 466.0, 433.0, 433.0] 
samples = [-19.995152, -46.38638, -28.329832, 81.12224, -67.002556, 21.798035, 32.538525, 38.022156, -14.682513, -56.849937, -20.680933, 95.41259, 22.357513, 36.404854, -37.628937, -3.4352758, -25.271269, 36.05853, 26.666668, -29.088951, -7.4670925, -47.769047, -29.178823, -43.958958, 29.638771, -10.448129, 5.71033, -25.612432, 37.268364, -22.657557, -2.0058606]

points = hcat(xs, ys)'

len = 1001
x2 = range(0, 500, length=len)
y2 = range(0, 500, length=len)

points2 = hcat(x2, y2)'
#@show points2

x3 =[]
y3 = []
for x in x2
    for y in y2
        push!(x3, x)
        push!(y3, y)
    end
end
#x3 = vec(x3)
#y3 = vec(y3)
#@show x3
#@show y3
points2 = hcat(vec(x3), vec(y3))'
#@show points2

#z2 = [itp(x,y) for y in y2, x in x2]
itp = ScatteredInterpolation.interpolate(Multiquadratic(), points, samples)
#interpolated = ScatteredInterpolation.evaluate(itp, [186.0, 60.0]) # evaluate it for how ever many coordinates that you want
interpolated = ScatteredInterpolation.evaluate(itp, points2) # evaluate it for how ever many coordinates that you want

gridded = reshape(interpolated, len^2)

fig = Figure(resolution=(1500, 800), figure_padding = 100)
ax1 = Axis3(fig[1,1])
ax2 = Axis3(fig[1,2])
#@show interpolated
#@show gridded
x3 = collect(Iterators.flatten(x3))
y3 = collect(Iterators.flatten(y3))

meshscatter!(ax1, xs, ys, samples, markersize = 5)
surface!(ax2, x3, y3, interpolated, markersize=1)

fig

The MWE is kind of hacky at the moment - appreciate your feedback on code optimisation for the following sections or any codes that can be optimised.

len = 1001
x2 = range(0, 500, length=len)
y2 = range(0, 500, length=len)

points2 = hcat(x2, y2)'
#@show points2

x3 =[]
y3 = []
for x in x2
    for y in y2
        push!(x3, x)
        push!(y3, y)
    end
end
#x3 = vec(x3)
#y3 = vec(y3)
#@show x3
#@show y3
points2 = hcat(vec(x3), vec(y3))'

and

x3 = collect(Iterators.flatten(x3))
y3 = collect(Iterators.flatten(y3))

Thanks.

@Humphrey_Lee, you can try this:

using ScatteredInterpolation, GLMakie

# INPUT:
xs = [187., 313., 157., 343., 132., 368., 157., 343., 187., 313., 85., 415., 46., 454., 85., 415., 250., 250., 57., 443., 129., 371., 164., 336., 250., 250., 250., 180., 320., 118., 382.]
ys = [57., 57., 134., 134., 250., 250., 366., 366., 443., 443., 129., 129., 250., 250., 371., 371., 132., 368., 313., 313., 415., 415., 403., 403., 417., 454., 475., 466., 466., 433., 433.] 
samples = [-19.9, -46.3, -28.3, 81.1, -67.0, 21.7, 32.5, 38.2, -14.6, -56.8, -20.6, 95.4, 22.3, 36.4, -37.6, -3.4, -25.2, 36.5, 26.6, -29.8, -7.4, -47.7, -29.1, -43.9, 29.6, -10.4, 5.7, -25.6, 37.2, -22.6, -2.]

# DATA CONDITIONING:
points = hcat(xs, ys)'
nx = 1001
ny = 2000
x2 = LinRange(0, 500, nx)
y2 = LinRange(0, 1000, ny)
xy = Iterators.product(x2, y2)
xx = getindex.(xy,1)
yy = getindex.(xy,2)
points2 = hcat(vec(xx), vec(yy))'

# INTERPOLATION:
itp = ScatteredInterpolation.interpolate(Multiquadratic(), points, samples)
interpolated = ScatteredInterpolation.evaluate(itp, points2)

# PLOTTING:
fig = Figure(resolution=(1500, 800), figure_padding = 100)
ax1 = Axis3(fig[1,1])
ax2 = Axis3(fig[1,2])
meshscatter!(ax1, xs, ys, samples, markersize = 5)
surface!(ax2, x2, y2, reshape(interpolated,nx,ny))
3 Likes

Copying @joa-quim, because GMT has probably one of the most used algorithms in the world of geosciences for gridding scattered X, Y, Z(x,y) data points like in the above example.

The reference for the algorithm used by GMT’s surface function is:

Smith, W. H. F, and P. Wessel, 1990, Gridding with continuous curvature splines in tension, Geophysics , 55, 293-305.

1 Like

Awesome @rafael.guerra!. Like your codes. Just wondering if there are more elegant ways.
Noticed that you are a GP. I’m a PE (PP to be more specific).

How does GMT.jl compares against GeoStats.jl?