Sample rows of a 3 dimensional array without replacement

Christopher_Fisher · November 7, 2021, 11:30pm

I would like to sample rows from a three dimensional array without replacement. For example, if I have

x = rand(10, 5, 3)

a random set of indices might look like {(2,2), (9,3)}, where the first position is the index of the first dimension and the second position is the index of the third dimension. If I wanted a point, I could use linear indexing. However, it is a bit trickier because I need to ignore the second row. Is there a good way to do this in Julia?

rafael.guerra · November 8, 2021, 12:12am

Maybe something like:

using StatsBase
x = rand(10, 5, 3)
@views sample(CartesianIndices(x[:,1,:]), 3; replace=false)

3-element Vector{CartesianIndex{2}}:
 CartesianIndex(1, 3)
 CartesianIndex(5, 2)
 CartesianIndex(4, 2)

Christopher_Fisher · November 8, 2021, 12:19am

Awesome. Thanks!

Christopher_Fisher · November 8, 2021, 12:35am

Do you happen to know of a more efficient solution for large arrays?

aplavin · November 8, 2021, 5:46am

A simple one-liner that copies:

using StatsBase, SplitApplyCombine

sample(splitdims(x, (1, 3)), 20; replace=false)

More efficient, doesn’t copy array content:

splitted = splitdimsview(x, (1, 3))
I = sample(CartesianIndices(splitted), 20; replace=false)
@view splitted[I]  # 20-element view(...)

rafael.guerra · November 8, 2021, 7:27am

Fyi, see herein some benchmarks:

using StatsBase
f(x,n) = @views sample(CartesianIndices(x[:,1,:]), n; replace=false)

using SplitApplyCombine
g(x,n) = sample(splitdims(x, (1, 3)), n; replace=false)

function h(x,n)
    splitted = splitdimsview(x, (1, 3))
    I = sample(CartesianIndices(splitted), n; replace=false)
    @view splitted[I]
end

n = 1000
x = rand(100, 50, 30)

using BenchmarkTools
@btime f($x,$n)     # 8.9 μs (3 allocations: 39.23 KiB)
@btime g($x,$n)     # 205.7 μs (3005 allocations: 1.47 MiB)
@btime h($x,$n)     # 9.4 μs (3 allocations: 39.23 KiB)

aplavin · November 8, 2021, 8:35am

This benchmark comparison looks totally reasonable, with the expected order between functions.
Note that f returns indices, while g and h return array elements. Also, I don’t think these 2d indices from f() can be used to directly index into 3d x…

Christopher_Fisher · November 8, 2021, 8:54am

Thank you both for the ideas!