What is the equivalent of numpy.random.choice for generating a non-uniform random sample based on a user supplied probability?

PyDataBlog · February 11, 2020, 10:23am

Hi,

I’m on the lookout for the equivalent of [numpy.random.choice()](numpy.random.choice — NumPy v1.15 Manual).

I want to generate random indices based on non-uniform random sampling.

For example, I can do this with Numpy by passing a list of the associated probability of each entry as:

rand_idx = numpy.random.choice(300, size=1, p=probability_list)

I would like to do this in Julia like:
rand_idx = rand(1:300, 1, #supply_probability_list# )

How can I do this in Julia?

nilshg · February 11, 2020, 10:38am

You might be looking for StatsBase.sample

PyDataBlog · February 11, 2020, 2:58pm

I have tried sample(1:300, probability_list) to no avail. It’s not accepting the the probability_list.

PyDataBlog · February 11, 2020, 3:12pm

Finally got it. I have to convert the list into a probability weight abstract array like:

sample(1:300, ProbabilityWeights(probability_list))

affans · February 11, 2020, 4:17pm

You can also use Distributions and rand

julia> using Distributions

julia> a = Categorical([0.2, 0.2, 0.2, 0.2, 0.2])
DiscreteNonParametric{Int64,Float64,Base.OneTo{Int64},Array{Float64,1}}(
support: Base.OneTo(5)
p: [0.2, 0.2, 0.2, 0.2, 0.2]
)


julia> rand(a)
3

rand(a) returns the index of the array with that probability.

If it matters,

julia> @time sample(1:5, ProbabilityWeights([0.2, 0.2, 0.2, 0.2, 0.2]))
  0.000016 seconds (7 allocations: 352 bytes)

julia> @time rand(a)
  0.000005 seconds (4 allocations: 160 bytes)

DNF · February 11, 2020, 6:18pm

For micro benchmarks like this you really should use BenchmarkTools. Also, you only included the construction of the weights in one of the benchmarks.

nilshg · February 12, 2020, 8:32am

Results will probably differ depending on your real world use case (Do you need to construct the weights once or more often? Do you sample once or many times? How large is the array you want to sample from etc) but as @DNF says reasonable benchmarking will help.

FWIW one attempt (including constructin the Categorical/ProbabilityWeights objects) at what I think the problem in the OP was:

julia> using Distributions, StatsBase

julia> vals = collect(1:300);
julia> weights = vals ./ sum(vals);

julia> sample_cat(v, w) = v[rand(Categorical(w))];
julia> sample_sb(v, w) = sample(v, ProbabilityWeights(w));

julia> using BenchmarkTools

julia> @btime sample_cat($vals, $weights)
774.490 ns (2 allocations: 2.53 KiB)

julia> @btime sample_sb($vals, $weights)
250.393 ns (1 allocation: 32 bytes)

Marco_Antoniotti · November 2, 2023, 10:35pm

Hi

just a couple of nagging questions I have regarding this old topic. I guess that my ‘statistics-fu’ is very low.

Do the weights built by pweights need to sum up to 1?
Is the item returned by sample built from a cumulative sum of said pweights?

Just trying to understand without delving too much in the code.

Thanks

MA

Topic		Replies	Views
Sampling from a list of integers without repetition New to Julia	3	6671	June 15, 2020
Random sampling with rand(Uniform(0,1), n) New to Julia question , random , sampling	4	295	May 30, 2024
Basic question about Distributions, drawing random sample and related documentaion New to Julia question	2	1165	January 19, 2020
Sample without replacement.. and without StatsBase or shuffle General Usage	12	302	May 3, 2024
How do I create an array with random unique numbers in a specific range? New to Julia question , random	20	1003	June 24, 2024

What is the equivalent of numpy.random.choice for generating a non-uniform random sample based on a user supplied probability?

Related topics