How large are your arrays typically? What kind of distribution do you have between falses and trues? Do you need to sample once from a given array or repeatedly? Is speed or memory more important?

The distribution of trues and falses is close enough to uniform.Typically the array is of the size 20x600x20. I then sample 1000 times per slice of this array, taking 10000 different slices of this array according to some symmetry. Sampling from slices also make it less efficient. A simplified code would be like this:

using MaskedArrays: mask
A = rand(Bool, 20, 600, 20)
function fun1(A)
for i in size(A,1), j in size(A,2)
input_clusters = A[i, j, :]
possible_nodes = mask(A[:,:,i], input_clusters, :) |> findall
for _ in 1:1000
node = rand(possible_nodes) # this is what I actually use
end
end
end
julia> @btime fun1($A)
585.338 μs (12011 allocations: 316.83 KiB)

In fact, using masked arrays, although seeming to me to be the appropriate thing to do, is not efficient

function fun2(A)
for i in size(A,1), j in size(A,2)
input_clusters = A[i, j, :] |> findall
possible_nodes = A[input_clusters, :, i] |> findall
for _ in 1:1000
row_idx, col = Tuple(rand(possible_nodes))
row = input_clusters[row_idx]
node = CartesianIndex(row, col) # this is what I actually use
end
end
end
julia> @btime fun2($A)
22.545 μs (5 allocations: 26.66 KiB)

I can’t say I really understand how A and input_clusters interact but I assume it has something to do with the symmetries you mentioned. Amortizing findall over 1000 samples doesn’t sound unreasonable but if you also have an outer loop you could consider writing your own findall loop which writes into a preexisting vector and reuses it throughout the loop. Actually writing your own loop would probably be a good idea anyway since you could more directly incorporate input_clusters without making temporary arrays.

If you have guarantees that your trues density doesn’t fall too low it may very well be worth trying the already proposed rejection sampling.

You’re correct of course, now the running time is much more realistic as well. Can’t edit anymore, so I’ll repost here:

using MaskedArrays: mask
A = rand(Bool, 20, 600, 20)
function fun1(A)
for i in axes(A,1), j in axes(A,2)
input_clusters = A[i, j, :]
possible_nodes = mask(A[:,:,i], input_clusters, :) |> findall
for _ in 1:1000
node = rand(possible_nodes) # this is what I actually use
end
end
end
julia> @btime fun1($A)
7.155 s (199689892 allocations: 4.93 GiB)
function fun2(A)
for i in axes(A,1), j in axes(A,2)
input_clusters = A[i, j, :] |> findall
possible_nodes = A[input_clusters, :, i] |> findall
for _ in 1:1000
row_idx, col = Tuple(rand(possible_nodes))
row = input_clusters[row_idx]
node = CartesianIndex(row, col) # this is what I actually use
end
end
end
julia> @btime fun2($A)
396.924 ms (49892 allocations: 431.63 MiB)