Hi,
I’m on the lookout for the equivalent of [numpy.random.choice()
](numpy.random.choice — NumPy v1.15 Manual ).
I want to generate random indices based on non-uniform random sampling.
For example, I can do this with Numpy by passing a list of the associated probability of each entry as:
rand_idx = numpy.random.choice(300, size=1, p=probability_list)
I would like to do this in Julia like:
rand_idx = rand(1:300, 1, #supply_probability_list# )
How can I do this in Julia?
nilshg
February 11, 2020, 10:38am
2
You might be looking for StatsBase.sample
4 Likes
I have tried sample(1:300, probability_list)
to no avail. It’s not accepting the the probability_list.
Finally got it. I have to convert the list into a probability weight abstract array like:
sample(1:300, ProbabilityWeights(probability_list))
2 Likes
affans
February 11, 2020, 4:17pm
5
You can also use Distributions
and rand
julia> using Distributions
julia> a = Categorical([0.2, 0.2, 0.2, 0.2, 0.2])
DiscreteNonParametric{Int64,Float64,Base.OneTo{Int64},Array{Float64,1}}(
support: Base.OneTo(5)
p: [0.2, 0.2, 0.2, 0.2, 0.2]
)
julia> rand(a)
3
rand(a
) returns the index of the array with that probability.
If it matters,
julia> @time sample(1:5, ProbabilityWeights([0.2, 0.2, 0.2, 0.2, 0.2]))
0.000016 seconds (7 allocations: 352 bytes)
julia> @time rand(a)
0.000005 seconds (4 allocations: 160 bytes)
5 Likes
DNF
February 11, 2020, 6:18pm
6
For micro benchmarks like this you really should use BenchmarkTools. Also, you only included the construction of the weights in one of the benchmarks.
3 Likes
nilshg
February 12, 2020, 8:32am
7
Results will probably differ depending on your real world use case (Do you need to construct the weights once or more often? Do you sample once or many times? How large is the array you want to sample from etc) but as @DNF says reasonable benchmarking will help.
FWIW one attempt (including constructin the Categorical
/ProbabilityWeights
objects) at what I think the problem in the OP was:
julia> using Distributions, StatsBase
julia> vals = collect(1:300);
julia> weights = vals ./ sum(vals);
julia> sample_cat(v, w) = v[rand(Categorical(w))];
julia> sample_sb(v, w) = sample(v, ProbabilityWeights(w));
julia> using BenchmarkTools
julia> @btime sample_cat($vals, $weights)
774.490 ns (2 allocations: 2.53 KiB)
julia> @btime sample_sb($vals, $weights)
250.393 ns (1 allocation: 32 bytes)
4 Likes
Hi
just a couple of nagging questions I have regarding this old topic. I guess that my ‘statistics-fu’ is very low.
Do the weights built by pweights
need to sum up to 1?
Is the item returned by sample
built from a cumulative sum of said pweights
?
Just trying to understand without delving too much in the code.
Thanks
MA