log(N) algo for sampling from vector

rveltz · January 31, 2018, 9:42am

Hi,

I am using the following piece of code

using StatsBase
pf = StatsBase.Weights(rand(10))
e = sample(pf)

Upon using @which sample(pf), one gets
sample(wv::StatsBase.AbstractWeights) in StatsBase at .../StatsBase/src/sampling.jl:425

for which the algorithm is super straightforward. For example, if t in this algorithm is very close to sum(wv), I would say it is better to go over wv starting from the end.

Hence my question, is it possible to do something like a binary search (like searchsorted) to do the same as StatsBase.sample?

I tried and I could not.

Thank you for your help,

Best regards

Tamas_Papp · January 31, 2018, 10:21am

Look further in that file for very efficient methods, eg the alias method.

rveltz · January 31, 2018, 1:57pm

I forgot to mention that I need to draw only one sample from the vector.

simonbyrne · January 31, 2018, 7:40pm

If all you have is a vector of relative weights, then you need to do at least some sort of O(n) preprocessing (since you have no idea where the weight is).

If there is some way that you could get the cumulative weights cheaply (i.e. cheaper than cpf = cumsum(pf), which is also O(n)), then you could do:

u = rand()*cpf[end]
searchsortedfirst(cpf, u)

which is O(log(n)).

rveltz · February 1, 2018, 7:02am

I tend to agree for most of your answer… I would improve the algo by starting from the beginning and from the last element at the same time though.

simonbyrne · February 7, 2018, 4:59am

I’m not sure what you mean: could you explain further?

Topic		Replies	Views
Allocation-free weighted samples Performance memory-allocation , staticarrays , sampling	12	285	September 12, 2024
Weighted sampling algorithms not yet shipped? General Usage question , statistics	1	302	October 2, 2021
Efficiently resample a large vector? Performance	5	590	June 23, 2022
Draw a random number through a probability distribution defined as an array General Usage distributions , random , sampling	5	115	May 21, 2025
AliasTables.jl Package Announcements	3	702	April 12, 2024

log(N) algo for sampling from vector

Related topics