I’m announcing a state of the art high performance implementation of the alias method for categorical non-uniform random sampling over 1:n with O(1) sampling time and O(n) setup time.

Source: GitHub - LilithHafner/AliasTables.jl: An efficient sampler for discrete random variables

Enjoy :slight_smile:

Happy to hear feedback and pointers to other implementations of this algorithm

Here’s a performance teaser

julia> using AliasTables, Chairmarks

julia> at = AliasTable(rand(17));

julia> @b rand($at)
2.878 ns

julia> @b rand(UInt64)
2.723 ns

julia> @b rand(1:17)
3.414 ns

Oh my gosh, new @Lilith package just dropped! :scream:

Out of curiosity, how does this compare to the sampling methods over in Statistics or StatsBase? I have no idea how they compare or how I would make a comparison so apologies for the vague question. I tend to just use the sample method from StatsBase so was curious.

Always excited to see whenever you have a new experimental package up!


~ tcp :deciduous_tree:


Its a very nice improvement!

Thanks @Lilith for the hard work!

1 Like

The PR @vancleve linked to has more details, but TL;DR is it’s almost always faster and so now StatsBase.jl does use it, and Distributions.jl will likely soon (Use a faster implementation of AliasTables by LilithHafner · Pull Request #1848 · JuliaStats/Distributions.jl · GitHub)

Some reasons one might use AliasTables.jl directly are load time/dependency reduction, or to implement very high performance samplers for downstream types that take advantage of implementation details of the underlying rng (e.g. xoshiro’s bulk generation)