Draw a random number through a probability distribution defined as an array

Torkel · May 21, 2025, 9:36am

If I have an NxM matrix with decimal values (normalised so that the sum is 1):

prob_dens::Matrix{Float64.Float64}

is there a good way to randomly draw an index i,j with probability corresponding to the probability distribution at that index?

Alexander-Barth · May 21, 2025, 10:08am

You can get an i,j candidate at random (uniform distribution) and accept it with a probability of prob_dens[i,j].

function sample(prob_dens)
while true
    I = rand(CartesianIndices(prob_dens))
    if rand() < prob_dens[I]
        return I
    end
end
end


prob_dens = rand(4,4); prob_dens /= sum(prob_dens)
sample(prob_dens)

Dan · May 21, 2025, 10:08am

The following worked for me:

using StatsBase

sample(CartesianIndices(prob_dens), weights(vec(prob_dens)))

The output can be wrapped with Tuple to get a tuple of the indices.

DNF · May 21, 2025, 10:29am

It’s probably best to stick to some established proper implementation, but here’s a ‘home-made’, which isn’t really tested, but shows an approach:

# Create normalized matrix
M = rand(5, 6);
M ./= sum(M)

inds = CartesianIndices(M) 
C = cumsum(vec(M))  # C is now a sort of cumulative distribution

function mysample(C, inds)
    i = searchsortedfirst(C, rand())
    return inds[i]
end

It uses a single rand() call and then looks up the corresponding element in the cumulative distribution. I’m not confident that it isn’t off-by-one or whether to use < or <= internally. But a single rand() is fast, and searching sorted vectors is fast.

Torkel · May 21, 2025, 10:37am

Thanks a lot everyone, that is very helpful!

stevengj · May 21, 2025, 10:40am

This will be slow for a large array — the expected number of times it calls rand() is proportional to the length of the array.

Note that this also takes time proportional to length(prob_dens) (though is faster because it only generates a single random number and scans the array consecutively), because the algorithm works by scanning a cumsum through the weights array to do inverse transform sampling.

If you need to generate many samples for a fixed weights table, it is probably faster to use AliasTables.jl, which allows you to precompute an efficient mapping from random numbers to indices via the alias method:

using AliasTables
at = AliasTable(vec(prob_dens))
CartesianIndices(prob_dens)[rand(at)]

e.g. this is almost 200x faster for me than sample for prob_dens = rand(100,200) (not including the one-time cost to generate the AliasTable).

Topic		Replies	Views
What is the equivalent of numpy.random.choice for generating a non-uniform random sample based on a user supplied probability? General Usage question	7	9684	November 2, 2023
Generate random value from a given function, out of box Statistics question	17	4449	December 20, 2017
Turing.jl: UniformDraw distribution? Probabilistic programming turing , distributions	2	547	July 23, 2021
Inverse transform sampling (discrete distributions sampling)? Performance question	4	845	March 1, 2021
Sampling an index from a boolean array General Usage arrays , random , boolean	7	488	October 3, 2022

Draw a random number through a probability distribution defined as an array

Related topics