Is it possible to index into a set of columns of a 3D array in a single line?

alex-s-gardner · February 3, 2022, 1:14am

if I have a 3D matrix:
foo = rand(10,10,100)

and i want to extract:

foo[1,2,:] and foo[3,4,:]

with a single index, is there a way to do that? I don’t want to use a loop as foo in my case is chunked data that lives in the cloud so multiple calls would result in unnecessary overhead.

I have searched far and wide but can’t seem to find what I’m looking for. Any insights would be greatly appreciated.

tomerarnon · February 3, 2022, 1:32am

If I understand what you mean, then the answer is no. You can do this in one “call” easily enough:

col1, col2 = getindex.(Ref(foo), (1,3), (2,4), :)

But the broadcast makes it so it’s essentially doing the loop over the x-y pairs (1,2) and (3,4).

gustaphe · February 3, 2022, 5:19am

In benchmarks I can’t see a big difference between this and the twoliner. Seems the compiler figures it out. That holds even if I make the column longer and swap the indices so there should be a big advantage to running along the long axis first.

fabiangans · February 3, 2022, 8:05am

Another option would be

foo[[CartesianIndex((1,2)),CartesianIndex((3,4))],:]

fabiangans · February 3, 2022, 8:10am

Ok, I just saw this comment, with which package did you open the array? Different packages like HDF5, NCDatasets, Zarr etc have their own getindex implementations and therefore the ideal way to read this would depend on the package you are using.

jishnub · February 3, 2022, 8:14am

Perhaps more concisely as

julia> col1, col2 = [foo[i...,:] for i in ((1,2),(3,4))];

julia> col1 == foo[1,2,:]
true

julia> col2 == foo[3,4,:]
true

fabiangans · February 3, 2022, 8:16am

Again, this comprehension is doing multiple calls to getindex, which is what the OP wants to avoid.

alex-s-gardner · February 3, 2022, 5:10pm

@fabiangans thanks a ton for the examples. I’m working with Zarr.jl and want to ensure that I’m not uncompromising a chunk more than once when pulling discrete columns of data. I’ll do some more digging/testing to see which stagey works best.

fabiangans · February 3, 2022, 5:17pm

@alex-s-gardner I did some tests as well and currently none of the suggested solutions will work the way you intend to. The array indexing in Zarr.jl is done through https://github.com/meggart/DiskArrays.jl , it would be good if you opened an issue there, fixing this would automatically help other packages based on DiskArrays like NetCDF.jl and ArchGDAL.jl

alex-s-gardner · February 3, 2022, 5:25pm

@fabiangans thanks for doing that… I’ll open an issue in DiskArrays.JL

rafael.guerra · February 4, 2022, 12:09am

Could we make it more compact? Say:

foo[[CartesianIndex.((1,3),(2,4))...],:]

alex-s-gardner · February 4, 2022, 12:59am

I like it… thanks… though the only option I could figure out for DiskArrays is to create a binary index but it’s amazingly inefficient.

In my case specific case:

mask = falses(size(foo["var"]))
mask[1, 1, :] .= true 
foo["var"][mask]

takes 30 seconds to read in and:

foo["var"][1,1,:]
takes 0.5 seconds to read

I’ve posted on DiskArrays

sijo · February 4, 2022, 9:29am

Each version can be made a bit more compact:

# Using two scalars instead of a tuple in the CartesianIndex constructor:
foo[[CartesianIndex(1,2), CartesianIndex(3,4)], :]

# Broadcasting with array arguments to get result as array directly:
foo[CartesianIndex.([1,3], [2,4]), :]

Fliks · March 2, 2022, 11:27pm

You could use YAXArrays.jl for that, but I am not sure, whether this is pushing a square peg through a round hole.

using YAXArrays

function innerapplymask(xout, xin, xmask, outvec)
    if only(xmask)
        push!(outvec, deepcopy(xin))
        return nothing
    else
        return nothing
    end
end


function outerapplymask(cube, mask)

    maskaxs = caxes(mask)
    applyaxs = setdiff(caxes(cube), caxes(mask))
    indims = InDims(applyaxs..., window_oob_value=-99, artype=YAXArray)
    #inmask = InDims(MovingWindow.(maskaxs, 0,0)..., window_oob_value=0)
    outdims=OutDims()
    outvec = []
    mapCube(innerapplymask, (cube, mask), outvec; indims=(indims, InDims()), outdims)
    return outvec
end

foo=rand(10,10,100)
maskarr = falses(10,10)
maskarr[1,2] = true
maskarr[3,4] = true
outerapplymask(YAXArray(foo), YAXArray(maskarr))

This gives you a list of YAXArray but you could change the artype in the InDims constructor to Array to get a list of plain arrays.

For my example YAXArray with a Zarr backend of this size:

YAXArray with the following dimensions
Lon                 Axis with 465 Elements from 672639.15 to 686559.15
Lat                 Axis with 444 Elements from 9.45343066e6 to 9.44014066e6
Time                Axis with 100 Elements from 2016-10-03T10:12:28 to 2020-02-09T10:12:44
Polarisation        Axis with 2 elements: VH VV 
name: layer
Total size: 157.52 MB

this takes
2.401743 seconds (2.15 M allocations: 859.379 MiB, 60.18% gc time)
compared to the list comprehension:

julia> @time [freqcube[ind.I...,:,1] for ind in CartesianIndices(mask.data) if mask.data[ind]]
1361.023080 seconds (3.35 M allocations: 1.973 TiB, 17.06% gc time, 0.01% compilation time)

alex-s-gardner · March 4, 2022, 8:22pm

Hey Felix, thanks for the great example. I’ll test implementing it in my workflow.

fabiangans · June 13, 2022, 3:52pm

Just to reference, now that Batch getindex by meggart · Pull Request #59 · meggart/DiskArrays.jl · GitHub got merged and tagged, this is optimized in all packages that use DiskArrays.jl for indexing and should be the recommended way to extract a set of columns as you describe for Zarr.jl.

alex-s-gardner · September 3, 2022, 6:11pm

This has now been solved with DiskArrays.jl merge #59

Topic		Replies	Views
Multi column indexing New to Julia arrays	4	693	August 12, 2021
Accessing multi-dimensional arrays in 1 go General Usage question , array	5	431	September 15, 2020
DiskArrays get array of values given a vector of Indices New to Julia question	6	381	April 27, 2022
Array indexing Internals & Design array , arrayofarrays	7	702	July 5, 2021
Indexing array based on a vector including Colon() General Usage question	3	811	November 29, 2018

Is it possible to index into a set of columns of a 3D array in a single line?

Related topics