Is it possible to index into a set of columns of a 3D array in a single line?

if I have a 3D matrix:
foo = rand(10,10,100)

and i want to extract:

foo[1,2,:] and foo[3,4,:]

with a single index, is there a way to do that? I don’t want to use a loop as foo in my case is chunked data that lives in the cloud so multiple calls would result in unnecessary overhead.

I have searched far and wide but can’t seem to find what I’m looking for. Any insights would be greatly appreciated.

If I understand what you mean, then the answer is no. You can do this in one “call” easily enough:

col1, col2 = getindex.(Ref(foo), (1,3), (2,4), :)

But the broadcast makes it so it’s essentially doing the loop over the x-y pairs (1,2) and (3,4).

4 Likes

In benchmarks I can’t see a big difference between this and the twoliner. Seems the compiler figures it out. That holds even if I make the column longer and swap the indices so there should be a big advantage to running along the long axis first.

Another option would be

foo[[CartesianIndex((1,2)),CartesianIndex((3,4))],:]
7 Likes

Ok, I just saw this comment, with which package did you open the array? Different packages like HDF5, NCDatasets, Zarr etc have their own getindex implementations and therefore the ideal way to read this would depend on the package you are using.

Perhaps more concisely as

julia> col1, col2 = [foo[i...,:] for i in ((1,2),(3,4))];

julia> col1 == foo[1,2,:]
true

julia> col2 == foo[3,4,:]
true

Again, this comprehension is doing multiple calls to getindex, which is what the OP wants to avoid.

1 Like

@fabiangans thanks a ton for the examples. I’m working with Zarr.jl and want to ensure that I’m not uncompromising a chunk more than once when pulling discrete columns of data. I’ll do some more digging/testing to see which stagey works best.

@alex-s-gardner I did some tests as well and currently none of the suggested solutions will work the way you intend to. The array indexing in Zarr.jl is done through https://github.com/meggart/DiskArrays.jl , it would be good if you opened an issue there, fixing this would automatically help other packages based on DiskArrays like NetCDF.jl and ArchGDAL.jl

2 Likes

@fabiangans thanks for doing that… I’ll open an issue in DiskArrays.JL

Could we make it more compact? Say:

foo[[CartesianIndex.((1,3),(2,4))...],:]
2 Likes

I like it… thanks… though the only option I could figure out for DiskArrays is to create a binary index but it’s amazingly inefficient.

In my case specific case:

mask = falses(size(foo["var"]))
mask[1, 1, :] .= true 
foo["var"][mask]

takes 30 seconds to read in and:

foo["var"][1,1,:]
takes 0.5 seconds to read

I’ve posted on DiskArrays

Each version can be made a bit more compact:

# Using two scalars instead of a tuple in the CartesianIndex constructor:
foo[[CartesianIndex(1,2), CartesianIndex(3,4)], :]

# Broadcasting with array arguments to get result as array directly:
foo[CartesianIndex.([1,3], [2,4]), :]
3 Likes

You could use YAXArrays.jl for that, but I am not sure, whether this is pushing a square peg through a round hole.

using YAXArrays

function innerapplymask(xout, xin, xmask, outvec)
    if only(xmask)
        push!(outvec, deepcopy(xin))
        return nothing
    else
        return nothing
    end
end


function outerapplymask(cube, mask)

    maskaxs = caxes(mask)
    applyaxs = setdiff(caxes(cube), caxes(mask))
    indims = InDims(applyaxs..., window_oob_value=-99, artype=YAXArray)
    #inmask = InDims(MovingWindow.(maskaxs, 0,0)..., window_oob_value=0)
    outdims=OutDims()
    outvec = []
    mapCube(innerapplymask, (cube, mask), outvec; indims=(indims, InDims()), outdims)
    return outvec
end

foo=rand(10,10,100)
maskarr = falses(10,10)
maskarr[1,2] = true
maskarr[3,4] = true
outerapplymask(YAXArray(foo), YAXArray(maskarr))

This gives you a list of YAXArray but you could change the artype in the InDims constructor to Array to get a list of plain arrays.

For my example YAXArray with a Zarr backend of this size:

YAXArray with the following dimensions
Lon                 Axis with 465 Elements from 672639.15 to 686559.15
Lat                 Axis with 444 Elements from 9.45343066e6 to 9.44014066e6
Time                Axis with 100 Elements from 2016-10-03T10:12:28 to 2020-02-09T10:12:44
Polarisation        Axis with 2 elements: VH VV 
name: layer
Total size: 157.52 MB

this takes
2.401743 seconds (2.15 M allocations: 859.379 MiB, 60.18% gc time)
compared to the list comprehension:

julia> @time [freqcube[ind.I...,:,1] for ind in CartesianIndices(mask.data) if mask.data[ind]]
1361.023080 seconds (3.35 M allocations: 1.973 TiB, 17.06% gc time, 0.01% compilation time)
1 Like

Hey Felix, thanks for the great example. I’ll test implementing it in my workflow.

1 Like

Just to reference, now that Batch getindex by meggart · Pull Request #59 · meggart/DiskArrays.jl · GitHub got merged and tagged, this is optimized in all packages that use DiskArrays.jl for indexing and should be the recommended way to extract a set of columns as you describe for Zarr.jl.

This has now been solved with DiskArrays.jl merge #59