I would like to sample rows from a three dimensional array without replacement. For example, if I have
x = rand(10, 5, 3)
a random set of indices might look like {(2,2), (9,3)}
, where the first position is the index of the first dimension and the second position is the index of the third dimension. If I wanted a point, I could use linear indexing. However, it is a bit trickier because I need to ignore the second row. Is there a good way to do this in Julia?
Maybe something like:
using StatsBase
x = rand(10, 5, 3)
@views sample(CartesianIndices(x[:,1,:]), 3; replace=false)
3-element Vector{CartesianIndex{2}}:
CartesianIndex(1, 3)
CartesianIndex(5, 2)
CartesianIndex(4, 2)
1 Like
Do you happen to know of a more efficient solution for large arrays?
A simple one-liner that copies:
using StatsBase, SplitApplyCombine
sample(splitdims(x, (1, 3)), 20; replace=false)
More efficient, doesn’t copy array content:
splitted = splitdimsview(x, (1, 3))
I = sample(CartesianIndices(splitted), 20; replace=false)
@view splitted[I] # 20-element view(...)
3 Likes
Fyi, see herein some benchmarks:
using StatsBase
f(x,n) = @views sample(CartesianIndices(x[:,1,:]), n; replace=false)
using SplitApplyCombine
g(x,n) = sample(splitdims(x, (1, 3)), n; replace=false)
function h(x,n)
splitted = splitdimsview(x, (1, 3))
I = sample(CartesianIndices(splitted), n; replace=false)
@view splitted[I]
end
n = 1000
x = rand(100, 50, 30)
using BenchmarkTools
@btime f($x,$n) # 8.9 μs (3 allocations: 39.23 KiB)
@btime g($x,$n) # 205.7 μs (3005 allocations: 1.47 MiB)
@btime h($x,$n) # 9.4 μs (3 allocations: 39.23 KiB)
1 Like
This benchmark comparison looks totally reasonable, with the expected order between functions.
Note that f
returns indices, while g
and h
return array elements. Also, I don’t think these 2d indices from f()
can be used to directly index into 3d x
…
1 Like
Thank you both for the ideas!