What is the equivalent of numpy.random.choice for generating a non-uniform random sample based on a user supplied probability?

Hi,

I’m on the lookout for the equivalent of [numpy.random.choice()](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.choice.html).

I want to generate random indices based on non-uniform random sampling.

For example, I can do this with Numpy by passing a list of the associated probability of each entry as:

rand_idx = numpy.random.choice(300, size=1, p=probability_list)

I would like to do this in Julia like:
rand_idx = rand(1:300, 1, #supply_probability_list# )

How can I do this in Julia?

You might be looking for StatsBase.sample

4 Likes

I have tried sample(1:300, probability_list) to no avail. It’s not accepting the the probability_list.

Finally got it. I have to convert the list into a probability weight abstract array like:

sample(1:300, ProbabilityWeights(probability_list))

2 Likes

You can also use Distributions and rand

julia> using Distributions

julia> a = Categorical([0.2, 0.2, 0.2, 0.2, 0.2])
DiscreteNonParametric{Int64,Float64,Base.OneTo{Int64},Array{Float64,1}}(
support: Base.OneTo(5)
p: [0.2, 0.2, 0.2, 0.2, 0.2]
)


julia> rand(a)
3

rand(a) returns the index of the array with that probability.

If it matters,

julia> @time sample(1:5, ProbabilityWeights([0.2, 0.2, 0.2, 0.2, 0.2]))
  0.000016 seconds (7 allocations: 352 bytes)

julia> @time rand(a)
  0.000005 seconds (4 allocations: 160 bytes)
4 Likes

For micro benchmarks like this you really should use BenchmarkTools. Also, you only included the construction of the weights in one of the benchmarks.

3 Likes

Results will probably differ depending on your real world use case (Do you need to construct the weights once or more often? Do you sample once or many times? How large is the array you want to sample from etc) but as @DNF says reasonable benchmarking will help.

FWIW one attempt (including constructin the Categorical/ProbabilityWeights objects) at what I think the problem in the OP was:

julia> using Distributions, StatsBase

julia> vals = collect(1:300);
julia> weights = vals ./ sum(vals);

julia> sample_cat(v, w) = v[rand(Categorical(w))];
julia> sample_sb(v, w) = sample(v, ProbabilityWeights(w));

julia> using BenchmarkTools

julia> @btime sample_cat($vals, $weights)
774.490 ns (2 allocations: 2.53 KiB)

julia> @btime sample_sb($vals, $weights)
250.393 ns (1 allocation: 32 bytes)
4 Likes