Translating some Python code: Vector of interpolations?

In some Python code that I am trying to translate to Julia, I have a numpy array of interpolations, the data starts off as a structured grid (i.e., not rectangular), and there is one variable that points to the sample locations and another variable that points to the sample values, say “locations” and “variable”. I then replace the 3D variable array with a 2D array, containing linear interpolations along the rays in the 3rd dimension. I then use this to locate some planes perpendicular to the third dimension for extraction as images.

Next, I’m describing what I am attempting, not trying to convince you it is a good idea …

In Julia, I wasn’t sure that storing the interpolations in a Matrix with Any type was a good idea, so I tried to capture the type with a simple call:

using Interpolations
approximate(x, y) = extrapolate(interpolate((x,), y, Gridded(Linear())), Flat())
T = typeof(approximate([0., 1.], [0., 1.]))
s = Matrix{T}(undef, 10, 10)

What the matrix buys me is the ability to look up an interpolation by (row, col) … is there a better way to do this?

I didn’t fully understand your question yesterday so I didn’t respond. Unfortunately, I still don’t really understand it today. But since no one else has responded yet, here are a few pieces.

I don’t understand your problem setup and reduction from a 3d array to a 2d array But I’ll remark that if everything is type stable and allocations are a negligible part of the run time (or avoided entirely), most performance differences with different data structures should be related mostly to the efficiency of those structures themselves for the problem at-hand. Since Julia is compiled, you don’t suffer the automatic performance pitfalls of leaving MATLAB or NumPy arrays, for example. In general, you should use what’s most natural to the problem. It sounds like you want s[row,column] indexing, so go ahead and use a Matrix like you are already.

Your use of a dummy calculation to determine the element type T is not uncommon and such patterns appear numerous places throughout base Julia and packages. The only thing I’ll add is that, in some cases, one can use a comprehension [f(x,y) for x in X, y in Y] or map or broadcast, rather than type determination + pre-allocation + fill, to generate arrays. The comprehensions, maps, and broadcasts can usually work out the element type for you (f it’s stable). For arrays built from complicated functions (especially those you’re still developing), this can sometimes be more ergonomic than trying to determine the element type in advance.

I’ll remark that, while untyped containers such as Matrix{Any} do come with performance penalties, these penalties are not always significant compared to the surrounding code (especially with appropriate use of function barriers). Sometimes the simplicity and versatility of an untyped container can be useful, relative to its performance drawback. But it sounds like you are able to find a single, suitable element type for your example here so you certainly should.

Hey, thanks!

I guess I was mostly looking to see if running the dummy calculation to figure the type out was something frowned upon, and perhaps fishing to see if there was some other more appropriate datatype than Matrix.

I made another test function that used comprehensions (below, test2), and it compared pretty well with the first version, test1:

function test1(x, y)
    rows, cols, _ = size(x)
    approximate(x, y) = extrapolate(interpolate((x,), y, Gridded(Linear())), Flat())
    xx = eachslice(x, dims=(1, 2))
    yy = eachslice(y, dims=(1, 2))
    T = typeof(approximate(xx[1], yy[1]))
    w = Matrix{T}(undef, rows, cols)

    for (ind, u, v) in zip(eachindex(w), xx, yy)
        w[ind] = approximate(u, v)
    end

    w
end


function test2(x, y)
    rows, cols, _ = size(x)
    approximate(x, y) = extrapolate(interpolate((x,), y, Gridded(Linear())), Flat())
    w = [approximate(u, v) for (u, v) in  zip(eachslice(x, dims=(1, 2)), eachslice(y, dims=(1, 2)))]
    w
end

pathlength, v= get_data("giant_data_blob.nc")
test1(pathlength, v)
test2(pathlength, v)

@time test1(pathlength, v)
@time test2(pathlength, v)

Which gives:

0.065073 seconds (38.42 k allocations: 73.024 MiB)
0.074947 seconds (38.43 k allocations: 73.008 MiB, 10.42% gc time)

It looks like your evaluation of test2 hit garbage collection. Garbage collection (GC) occurs “at random” (rather, after so much memory has been allocated). If you benchmarked them again, you might see different combinations of them hit GC. Since both have virtually the same number and size of memory allocations, both should be equally likely to trigger GC. Accounting for the claimed GC time on the second benchmark, I would say the performance is indistinguishable.

Now, having seen how you fill this array, I might suggest broadcasting or map to be cleaner than a comprehension (which requires zip in this case). Since xx and yy should have the same size, I will specifically suggest map. There should be no noticeable performance difference between any of these or your original comprehension.

xx = eachslice(x, dims=(1,2))
yy = eachslice(y, dims=(1,2))

w = map(approximate, xx, yy) # map option
w = broadcast(approximate, xx, yy) # broadcast option
w = approximate.(xx, yy) # broadcast option using "dot" syntax
1 Like