2D interpolation of large data to calculate depth profiles

Im surprised that you didnt find GeoStats.jl in your searches. We easily interpolate more than 200Million points in (3D) industrial applications.

In general, interpolation only depends on the local values. So in any methods used, subsampling to the local area (chosen using some nearest neighbor method - see NearestNeighbors.jl) and then interpolating using the methods previously used will work.

For the example in the OP:

# provide local interpolation
function local_itp(p)
    pidx = searchsortedfirst.((x,y),p)
    xrng, yrng = (:).(clamp.(pidx .- 5, 1, 100), 
                      clamp.(pidx .+5, 1, 100))
    itp = linear_interpolation((x[xrng],y[yrng]),z[xrng,yrng])
    itp(p...)
end
# use the local interpolation
fig = Figure(size=(900, 400), fontsize=20)
axs = [Axis(fig[1, j], aspect=1) for j in 1:2]
cmap = :roma
contour!(axs[1], x, y, z, levels=30, colormap=cmap)
linesegments!(axs[1], [A, B], color=:red, linewidth=5)

lines!(axs[2], 
  range(0, distance; length=n_profile_samplepoints), 
  [local_itp(p) for p in coords])

fig

Oh OK, sounds like something Iā€™d need. I will have a look. I searched for 2D interpolations and thought Interpolations.jl would cut itā€¦

I got it working with DIVAnd.jl but it seems that the interpolation is too coarse. Somehow it caps at 100x100 in the interpolation backend?

That is because it uses default settings. If you want higher resolutions, you can specify the underlying output grid using something as xi=(collect(range(minimum(x),maximum(x),1000)),collect(range(minimum(y),maximum(y),1000))) as keyword and look at the general DIVAnd documentation

1 Like

Well I see my mistake that I forgot to emphasise that I am not able to create z (the 2D depth map) due to memory restrictions. This is the reason why things get tricky very quickly.

@tamasgal check the InterpolateNeighbors transform in the GeoStats.jl docs or in the book. It will do the job using the NearestNeighbors.jl approach.

2 Likes

From your responses your input consists of 14347538 scattered points represented by 3 arrays x, y, z of length 14347538 each. This does not look too big and no need for a matrix. Have I got it correct?

Yep, thatā€™s correct :slight_smile:

But if you have it interpolated into a grid, you may have a use for the goodies in GMTā€™s grdtrack (stacked profiles, crossprofiles profiles, etc)

EDIT: Is this multi-beam data?

Thanks, I will try that too!

No multi-beam, neutrinos through Earth from far outside our galaxy :wink:

FWIW, ScatteredInterpolation.jl takes about 1 minute to run on my Windows laptop for 14347538 points. However, the results interpolated along the cyan line donā€™t look great in the example below:

ScatteredInterpolation code
# 1 - Generate input data
peaks(x, y) = 3(1-x)^2*exp(-x^2-(y+1)^2) -10(x/5-x^3-y^5)*exp(-x^2-y^2) - exp(-(x+1)^2-y^2)/3
N = 14347538
x, y = 6*(rand(N) .- 0.5), 6*(rand(N) .- 0.5)
z = peaks.(x, y)


# 2 - Interpolate scattered data
using ScatteredInterpolation
# interpolate z function of x, y
itp_z = interpolate(Shepard(), permutedims([x y]), z)

coords = [[-3,-3], [3,3]]       # end points of line along which interpolate data
Ni = 200                        # number of equidistant points to interpolate
xi = LinRange(coords[1][1], coords[2][1], Ni)
yi = LinRange(coords[1][2], coords[2][2], Ni)
zi = evaluate(itp_z, permutedims([xi yi]))

# 3 - Plot results
using Plots; gr(dpi=600)
p1 = contourf(sort(x[1:5000:end]), sort(y[1:5000:end]), peaks, ratio=1, lims=(-3,3))
plot!(first.(coords), last.(coords), c=:cyan)
scatter!(x[1:5000:end], y[1:5000:end], c=:red, msw=0, ms=1)
p2 = plot(xi, peaks.(xi, yi), lw=5, c=:blues, label="Exact")
plot!(xi, zi, lw=2, ls=:dash, c=:black, label="Shepard")
plot(p1, p2, size=(1000,600))

PS:
there are 5000x more points than displayed, for clarity purposes.

1 Like

This thread lists a number of packages you could try (some of which have already been mentioned above): 2D Interpolation on an irregular grid

1 Like

This takes ~5 sec on my computer (x,y,z from Rafaelā€™s post above)

using GMT

@time G = gridit([x y z], limis=(-3,3,-3,3), method="nearneighbor", inc=0.1, search_radius=0.1)
  4.803249 seconds (453 allocations: 328.435 MiB, 0.79% gc time)

but, if I understood it, the OPā€™s problem was memory available.