2D interpolation of large data to calculate depth profiles

tamasgal · June 25, 2024, 7:59am

I am working with a dataset which is essentially a depth sample map: x and y coordinates on an irregular grid and the corresponding depth values. I need to calculate tons of depth profiles for different lines defined by two points A and B.

I thought interpolating would be an easy task, as I have done it so many times but I realised that the dataset is too large (millions of points) and I get an OutOfMemoryError from Interpolations.jl.

Since this is a fairly common task to create these “depth profiles” (but not my expertise at all) I thought I quickly hack together a MWE and ask around if there is a sophisticated solution out there, before I naively hack together something

Below is my MWE. The issue is that n_datapoints grows (at least) with O(n^2) in memory. My dataset has millions of points and the example below already starts to hang with a few thousands of entry (which is no surprise, given how the interpolator works).

Is there any state of the art solution for this kind of problem which is maybe already implemented somewhere? Like “lazy interpolations” or so? At least this is implemented in basically every map software which can calculate z profiles for a drawn line in real-time

Another approach would be: based on A and B, only select those datapoints which are in the road width of the line between them, however this requires some sorting and can become a bit of a hassle when dealing with lots of A and B pairs.

Here is the Pluto.jl notebook in case you prefer that:
2D Interpolation of Depth Profiles.jl (49.3 KB)

using LinearAlgebra
using CairoMakie
using Interpolations

n_datapoints = 100  # this is the sensitive parameter
n_profile_samplepoints = 50
	
x = range(-2, 2; length=n_datapoints)
y = range(-2, 2; length=n_datapoints)
f(z) = (z^3 - 3) / z	
z = [angle(f(u + 1im * v)) for u in x, v in y]

# two points to define a z-cut
A = Point2f(1.0, 1.5)
B = Point2f(-1.5, -1.0)
# the distance so that we can scale the profile accordingly
distance = norm(A - B)
# these will be our sample points for the interpolator
coords = [(A[1] + t * (B[1] - A[1]), A[2] + t * (B[2] - A[1])) for t in range(0, 1; length=n_profile_samplepoints)]

itp = linear_interpolation((x, y), z)

fig = Figure(size=(900, 400), fontsize=20)
axs = [Axis(fig[1, j], aspect=1) for j in 1:2]
cmap = :roma
contour!(axs[1], x, y, z, levels=30, colormap=cmap)
linesegments!(axs[1], [A, B], color=:red, linewidth=5)

lines!(axs[2], range(0, distance; length=n_profile_samplepoints), [itp(p...) for p in coords])

fig

JM_Beckers · June 25, 2024, 8:34am

Are your data point on a regular grid as in your example or scattered as you seem to indicate in the text?

tamasgal · June 25, 2024, 8:34am

They are scattered, unfortunately.

JM_Beckers · June 25, 2024, 8:46am

Then you could try this in your MWE

using DIVAnd




mask,pmn,xyi = DIVAnd_rectdom(x,y)
# Actually you need to pass a collection of scattered points here 10000 for the example
myfun=DIVAndfun(([xyi[1]...],[xyi[2]...]),[z...])

and plot your interpolation and DIVAnd interpolations (here with Plots)

plot(range(0, distance; length=n_profile_samplepoints), [itp(p...) for p in coords])
plot!(range(0, distance; length=n_profile_samplepoints), [myfun(p...) for p in coords])

fedoroff · June 25, 2024, 8:50am

You can write your own interpolation:

# https://en.wikipedia.org/wiki/Bilinear_interpolation
# Weighted mean
res = zeros(length(coords))
for (i, p) in enumerate(coords)
    ix = searchsortedfirst(x, p[1])
    iy = searchsortedfirst(y, p[2])
    z11 = z[ix-1,iy-1]
    z12 = z[ix-1,iy]
    z21 = z[ix,iy-1]
    z22 = z[ix,iy]
    denom = (x[ix] - x[ix-1]) * (y[iy] - y[iy-1])
    w11 = (x[ix] - p[1]) * (y[iy] - p[2]) / denom
    w12 = (x[ix] - p[1]) * (p[2] - y[iy-1]) / denom
    w21 = (p[1] - x[ix-1]) * (y[iy] - p[2]) / denom
    w22 = (p[1] - x[ix-1]) * (p[2] - y[iy-1]) / denom
    res[i] = w11*z11 + w12*z12 + w21*z21 + w22*z22
end
@show isapprox(res, [itp(p...) for p in coords])   # true

It does not store any intermediate arrays.

tamasgal · June 25, 2024, 8:55am

Unfortunately DIVAnd_rectdom gives an OutOfMemoryError for my data. I even tried to downsample by a factor of 10 but not luck.

JM_Beckers · June 25, 2024, 9:12am

I used that function DIVAnd_rectdom only for your example since you created a collection of scattered data from a regular grid definition.

For already scattered data, you just need to pass the x and y coordinates of all depth values z directly to DIVAndfun. In other words, x,y,z have the same length and you just pass them to DIVAndfun((x,y),z)

tamasgal · June 25, 2024, 9:20am

Hmm, I am a bit confused, I only get NaNs after a couple of seconds when I pass my data (14347538 points in total), including the warning that basically all observations are NaN:

tamasgal · June 25, 2024, 9:22am

This assumes that I have z, a 2D matrix of my z-data, right? That’s exactly what’s not fitting into my memory, unfortunately.

fedoroff · June 25, 2024, 9:23am

What you need is only the values of z in the four points surrounding your point of interest.

JM_Beckers · June 25, 2024, 9:26am

Strange. What does df.elevation[1:10] looks like for example ?

tamasgal · June 25, 2024, 9:32am

Here the DataFrame:

tamasgal · June 25, 2024, 9:35am

That I understand the concept of course, but in your example above there is already a two-dimensional map of the z-data (z). I will try to figure that out, but I hoped I can get away with some working implementation instead of writing my own interpolator

JM_Beckers · June 25, 2024, 9:39am

Hmm, not sure why that turns up as NaN, but independently of that, it seems to me that your data are not scattered but already on a regular grid if I look at the coordinates. If that is the case, you should have no problems using classical interpolations.

If data are really scattered and you want to investigate the NaN problem, we could investigate further via pm.

tamasgal · June 25, 2024, 9:39am

It works with 5million entries, so I could downsample by a factor of 3 in the current case

tamasgal · June 25, 2024, 9:40am

There are many “holes” in the dataset and some parts are finer or more coarse, unfortunately, so it’s really irregular.

tamasgal · June 25, 2024, 9:44am

False alarm… downsampling is not working, somehow I will only get NaNs then, no matter how far I push it

That I don’t really understand. Why would it break the algorithm?

fedoroff · June 25, 2024, 9:44am

But in any case you will have to somehow introduce your depth map as a whole or in separate batches. Otherwise, how any interpolator will know it.

tamasgal · June 25, 2024, 9:46am

Yes, I think I have to use the searchsortedfirst to find the nearest x and then start again with the y value (latitude) at the index of the found x. Then do something similar to find the neighbours

tamasgal · June 25, 2024, 9:56am

Turned out that are still a couple of NaNs in the data

I’ll report back…

Topic		Replies	Views
Confusion on Creating Gridded Interpolations General Usage question , packages , interpolations	23	4650	January 8, 2022
Interpolations on scatter data of fun= f(x,y,z) General Usage interpolations	4	175	February 11, 2025
2D Interpolation on an irregular grid Performance interpolations , splines	8	1970	October 20, 2023
How to transform this grid? General Usage question , machinevision	20	3723	October 31, 2017
Interpolate 2d function Numerics question , interpolations , bsplinekit	14	1660	March 4, 2025

2D interpolation of large data to calculate depth profiles

Related topics