If I have an NxM matrix with decimal values (normalised so that the sum is 1):
prob_dens::Matrix{Float64.Float64}
is there a good way to randomly draw an index i,j
with probability corresponding to the probability distribution at that index?
If I have an NxM matrix with decimal values (normalised so that the sum is 1):
prob_dens::Matrix{Float64.Float64}
is there a good way to randomly draw an index i,j
with probability corresponding to the probability distribution at that index?
You can get an i,j
candidate at random (uniform distribution) and accept it with a probability of prob_dens[i,j].
function sample(prob_dens)
while true
I = rand(CartesianIndices(prob_dens))
if rand() < prob_dens[I]
return I
end
end
end
prob_dens = rand(4,4); prob_dens /= sum(prob_dens)
sample(prob_dens)
The following worked for me:
using StatsBase
sample(CartesianIndices(prob_dens), weights(vec(prob_dens)))
The output can be wrapped with Tuple
to get a tuple of the indices.
It’s probably best to stick to some established proper implementation, but here’s a ‘home-made’, which isn’t really tested, but shows an approach:
# Create normalized matrix
M = rand(5, 6);
M ./= sum(M)
inds = CartesianIndices(M)
C = cumsum(vec(M)) # C is now a sort of cumulative distribution
function mysample(C, inds)
i = searchsortedfirst(C, rand())
return inds[i]
end
It uses a single rand()
call and then looks up the corresponding element in the cumulative distribution. I’m not confident that it isn’t off-by-one or whether to use <
or <=
internally. But a single rand()
is fast, and searching sorted vectors is fast.
Thanks a lot everyone, that is very helpful!
This will be slow for a large array — the expected number of times it calls rand()
is proportional to the length of the array.
Note that this also takes time proportional to length(prob_dens)
(though is faster because it only generates a single random number and scans the array consecutively), because the algorithm works by scanning a cumsum through the weights array to do inverse transform sampling.
If you need to generate many samples for a fixed weights table, it is probably faster to use AliasTables.jl, which allows you to precompute an efficient mapping from random numbers to indices via the alias method:
using AliasTables
at = AliasTable(vec(prob_dens))
CartesianIndices(prob_dens)[rand(at)]
e.g. this is almost 200x faster for me than sample
for prob_dens = rand(100,200)
(not including the one-time cost to generate the AliasTable
).