Hi all,
I’d like to share with you DynamicSampling.jl, a package I built which implements samplers that can be used to sample from a dynamic discrete distribution supporting removal, addition and sampling of elements in constant time. The discrete distribution can be constructed incrementally by adding pairs of indices and weights, e.g.
julia> using DynamicSampling, Random
julia> rng = Xoshiro(42);
julia> sampler = DynamicSampler(rng);
julia> # the sampler contains indices
for i in 1:10
push!(sampler, i, Float64(i)) # add elements one by one
end
julia> rand(sampler)
7
julia> delete!(sampler, 8) # deletes 8 from the sampler
It is already used in a project I’m working on and it results in huge speedups in respect of using static sampling techniques which require re-initialization if the distribution changes over time. Actually it is very fast also in “worst-case scenarios”: in a comparison between the static methods exported by StatsBase.jl and DynamicSampling.jl, the sampler is even better sometimes than the static methods when compared with them, e.g. consider these two implementation of sample with replacement and sample without replacement:
julia> using DynamicSampling
julia> function static_sample_with_replacement(rng, inds, weights, n)
sp = DynamicSampler(rng)
append!(sp, inds, weights)
return rand(sp, n)
end
julia> function static_sample_without_replacement(rng, inds, weights, n)
sp = DynamicSampler(rng)
append!(sp, inds, weights)
s = Vector{Int}(undef, n)
for i in eachindex(s)
idx = rand(sp)
delete!(sp, idx)
s[i] = idx
end
return s
end
I run a comparison between the equivalent methods in StatsBase.jl for extracting small and big samples and this is what I got
More info on the benchmark are provided in the ReadMe if interested. The comparison in the case of sampling with replacement hits the AliasTables.jl package, and it is interesting that DynamicSampling.jl is competittive with it in my opinion given that the alias method is generally considered the fastest in the static case (I tag @Lilith who could maybe be interested in this comparison given that they wrote that fantastic library)