Hi,

I’m on the lookout for the equivalent of `[numpy.random.choice()`

](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.choice.html ).

I want to generate random indices based on non-uniform random sampling.

For example, I can do this with Numpy by passing a list of the associated probability of each entry as:

`rand_idx = numpy.random.choice(300, size=1, p=probability_list)`

I would like to do this in Julia like:
`rand_idx = rand(1:300, 1, #supply_probability_list# )`

How can I do this in Julia?

nilshg
February 11, 2020, 10:38am
#2
You might be looking for `StatsBase.sample`

4 Likes

I have tried `sample(1:300, probability_list)`

to no avail. It’s not accepting the the probability_list.

Finally got it. I have to convert the list into a probability weight abstract array like:

`sample(1:300, ProbabilityWeights(probability_list))`

2 Likes

affans
February 11, 2020, 4:17pm
#5
You can also use `Distributions`

and `rand`

```
julia> using Distributions
julia> a = Categorical([0.2, 0.2, 0.2, 0.2, 0.2])
DiscreteNonParametric{Int64,Float64,Base.OneTo{Int64},Array{Float64,1}}(
support: Base.OneTo(5)
p: [0.2, 0.2, 0.2, 0.2, 0.2]
)
julia> rand(a)
3
```

`rand(a`

) returns the index of the array with that probability.

If it matters,

```
julia> @time sample(1:5, ProbabilityWeights([0.2, 0.2, 0.2, 0.2, 0.2]))
0.000016 seconds (7 allocations: 352 bytes)
julia> @time rand(a)
0.000005 seconds (4 allocations: 160 bytes)
```

4 Likes

DNF
February 11, 2020, 6:18pm
#6
For micro benchmarks like this you really should use BenchmarkTools. Also, you only included the construction of the weights in one of the benchmarks.

3 Likes

nilshg
February 12, 2020, 8:32am
#7
Results will probably differ depending on your real world use case (Do you need to construct the weights once or more often? Do you sample once or many times? How large is the array you want to sample from etc) but as @DNF says reasonable benchmarking will help.

FWIW one attempt (including constructin the `Categorical`

/`ProbabilityWeights`

objects) at what I think the problem in the OP was:

```
julia> using Distributions, StatsBase
julia> vals = collect(1:300);
julia> weights = vals ./ sum(vals);
julia> sample_cat(v, w) = v[rand(Categorical(w))];
julia> sample_sb(v, w) = sample(v, ProbabilityWeights(w));
julia> using BenchmarkTools
julia> @btime sample_cat($vals, $weights)
774.490 ns (2 allocations: 2.53 KiB)
julia> @btime sample_sb($vals, $weights)
250.393 ns (1 allocation: 32 bytes)
```

4 Likes