2D interpolation of large data to calculate depth profiles

I am working with a dataset which is essentially a depth sample map: x and y coordinates on an irregular grid and the corresponding depth values. I need to calculate tons of depth profiles for different lines defined by two points A and B.

I thought interpolating would be an easy task, as I have done it so many times but I realised that the dataset is too large (millions of points) and I get an OutOfMemoryError from Interpolations.jl.

Since this is a fairly common task to create these “depth profiles” (but not my expertise at all) I thought I quickly hack together a MWE and ask around if there is a sophisticated solution out there, before I naively hack together something :wink:

Below is my MWE. The issue is that n_datapoints grows (at least) with O(n^2) in memory. My dataset has millions of points and the example below already starts to hang with a few thousands of entry (which is no surprise, given how the interpolator works).

Is there any state of the art solution for this kind of problem which is maybe already implemented somewhere? Like “lazy interpolations” or so? :smiley: At least this is implemented in basically every map software which can calculate z profiles for a drawn line in real-time :slight_smile:

Another approach would be: based on A and B, only select those datapoints which are in the road width of the line between them, however this requires some sorting and can become a bit of a hassle when dealing with lots of A and B pairs.

Here is the Pluto.jl notebook in case you prefer that:
2D Interpolation of Depth Profiles.jl (49.3 KB)

using LinearAlgebra
using CairoMakie
using Interpolations

n_datapoints = 100  # this is the sensitive parameter
n_profile_samplepoints = 50
x = range(-2, 2; length=n_datapoints)
y = range(-2, 2; length=n_datapoints)
f(z) = (z^3 - 3) / z	
z = [angle(f(u + 1im * v)) for u in x, v in y]

# two points to define a z-cut
A = Point2f(1.0, 1.5)
B = Point2f(-1.5, -1.0)
# the distance so that we can scale the profile accordingly
distance = norm(A - B)
# these will be our sample points for the interpolator
coords = [(A[1] + t * (B[1] - A[1]), A[2] + t * (B[2] - A[1])) for t in range(0, 1; length=n_profile_samplepoints)]

itp = linear_interpolation((x, y), z)

fig = Figure(size=(900, 400), fontsize=20)
axs = [Axis(fig[1, j], aspect=1) for j in 1:2]
cmap = :roma
contour!(axs[1], x, y, z, levels=30, colormap=cmap)
linesegments!(axs[1], [A, B], color=:red, linewidth=5)

lines!(axs[2], range(0, distance; length=n_profile_samplepoints), [itp(p...) for p in coords])


Are your data point on a regular grid as in your example or scattered as you seem to indicate in the text?

They are scattered, unfortunately.

Then you could try this in your MWE

using DIVAnd

mask,pmn,xyi = DIVAnd_rectdom(x,y)
# Actually you need to pass a collection of scattered points here 10000 for the example

and plot your interpolation and DIVAnd interpolations (here with Plots)

plot(range(0, distance; length=n_profile_samplepoints), [itp(p...) for p in coords])
plot!(range(0, distance; length=n_profile_samplepoints), [myfun(p...) for p in coords])

1 Like

You can write your own interpolation:

# https://en.wikipedia.org/wiki/Bilinear_interpolation
# Weighted mean
res = zeros(length(coords))
for (i, p) in enumerate(coords)
    ix = searchsortedfirst(x, p[1])
    iy = searchsortedfirst(y, p[2])
    z11 = z[ix-1,iy-1]
    z12 = z[ix-1,iy]
    z21 = z[ix,iy-1]
    z22 = z[ix,iy]
    denom = (x[ix] - x[ix-1]) * (y[iy] - y[iy-1])
    w11 = (x[ix] - p[1]) * (y[iy] - p[2]) / denom
    w12 = (x[ix] - p[1]) * (p[2] - y[iy-1]) / denom
    w21 = (p[1] - x[ix-1]) * (y[iy] - p[2]) / denom
    w22 = (p[1] - x[ix-1]) * (p[2] - y[iy-1]) / denom
    res[i] = w11*z11 + w12*z12 + w21*z21 + w22*z22
@show isapprox(res, [itp(p...) for p in coords])   # true

It does not store any intermediate arrays.

1 Like

Unfortunately DIVAnd_rectdom gives an OutOfMemoryError for my data. I even tried to downsample by a factor of 10 but not luck.

I used that function DIVAnd_rectdom only for your example since you created a collection of scattered data from a regular grid definition.

For already scattered data, you just need to pass the x and y coordinates of all depth values z directly to DIVAndfun. In other words, x,y,z have the same length and you just pass them to DIVAndfun((x,y),z)

1 Like

Hmm, I am a bit confused, I only get NaNs after a couple of seconds when I pass my data (14347538 points in total), including the warning that basically all observations are NaN:

This assumes that I have z, a 2D matrix of my z-data, right? That’s exactly what’s not fitting into my memory, unfortunately.

What you need is only the values of z in the four points surrounding your point of interest.

Strange. What does df.elevation[1:10] looks like for example ?

Here the DataFrame:

That I understand the concept of course, but in your example above there is already a two-dimensional map of the z-data (z). I will try to figure that out, but I hoped I can get away with some working implementation instead of writing my own interpolator :wink:

Hmm, not sure why that turns up as NaN, but independently of that, it seems to me that your data are not scattered but already on a regular grid if I look at the coordinates. If that is the case, you should have no problems using classical interpolations.

If data are really scattered and you want to investigate the NaN problem, we could investigate further via pm.

It works with 5million entries, so I could downsample by a factor of 3 in the current case :wink:

There are many “holes” in the dataset and some parts are finer or more coarse, unfortunately, so it’s really irregular.

False alarm… downsampling is not working, somehow I will only get NaNs then, no matter how far I push it :confused:

That I don’t really understand. Why would it break the algorithm?

But in any case you will have to somehow introduce your depth map as a whole or in separate batches. Otherwise, how any interpolator will know it.

Yes, I think I have to use the searchsortedfirst to find the nearest x and then start again with the y value (latitude) at the index of the found x. Then do something similar to find the neighbours :wink:

Turned out that are still a couple of NaNs in the data :laughing:

I’ll report back…