Draw a random number through a probability distribution defined as an array

If I have an NxM matrix with decimal values (normalised so that the sum is 1):

prob_dens::Matrix{Float64.Float64}

is there a good way to randomly draw an index i,j with probability corresponding to the probability distribution at that index?

You can get an i,j candidate at random (uniform distribution) and accept it with a probability of prob_dens[i,j].

function sample(prob_dens)
while true
    I = rand(CartesianIndices(prob_dens))
    if rand() < prob_dens[I]
        return I
    end
end
end


prob_dens = rand(4,4); prob_dens /= sum(prob_dens)
sample(prob_dens)
2 Likes

The following worked for me:

using StatsBase

sample(CartesianIndices(prob_dens), weights(vec(prob_dens)))

The output can be wrapped with Tuple to get a tuple of the indices.

4 Likes

It’s probably best to stick to some established proper implementation, but here’s a ‘home-made’, which isn’t really tested, but shows an approach:

# Create normalized matrix
M = rand(5, 6);
M ./= sum(M)

inds = CartesianIndices(M) 
C = cumsum(vec(M))  # C is now a sort of cumulative distribution

function mysample(C, inds)
    i = searchsortedfirst(C, rand())
    return inds[i]
end

It uses a single rand() call and then looks up the corresponding element in the cumulative distribution. I’m not confident that it isn’t off-by-one or whether to use < or <= internally. But a single rand() is fast, and searching sorted vectors is fast.

1 Like

Thanks a lot everyone, that is very helpful!

This will be slow for a large array — the expected number of times it calls rand() is proportional to the length of the array.

Note that this also takes time proportional to length(prob_dens) (though is faster because it only generates a single random number and scans the array consecutively), because the algorithm works by scanning a cumsum through the weights array to do inverse transform sampling.

If you need to generate many samples for a fixed weights table, it is probably faster to use AliasTables.jl, which allows you to precompute an efficient mapping from random numbers to indices via the alias method:

using AliasTables
at = AliasTable(vec(prob_dens))
CartesianIndices(prob_dens)[rand(at)]

e.g. this is almost 200x faster for me than sample for prob_dens = rand(100,200) (not including the one-time cost to generate the AliasTable).

6 Likes