# Is allocation inevitable when generating random numbers from a categorical distribution?

Hi,

I am trying to do in-place random number generation using `Distributions.jl`. However, for some reason, only the random number generation from the Categorical distribution takes extra computation time due to the extra allocations. Here is the code.

``````function loop()
w = rand(10^5)
normalize!(w,1)
tmpI = zeros(Int,10^5)
tmpF = zeros(Float64,10^5)

@time for i in 1:10^3
rand!(Categorical(w), tmpI)
end

@time for i in 1:10^3
rand!(Normal(0,1), tmpF)
end

@time for i in 1:10^3
rand!(Gamma(1,1), tmpF)
end

@time for i in 1:10^3
rand!(InverseGamma(1,1), tmpF)
end

@time for i in 1:10^3
rand!(Uniform(0,1), tmpF)
end
end

loop()
``````

And here is the result.

``````  4.153208 seconds (10.00 k allocations: 3.726 GiB, 1.42% gc time)
0.269260 seconds
0.444689 seconds
0.883855 seconds
0.063641 seconds
``````

Is this inevitable?

Looks like it, although I can save 20% allocations and 10% timing on my machine by pulling out the creation of the distribution object:

``````julia> catdist = Categorical(w);

julia> @btime rand!(Categorical(\$w), \$tmpI);
3.279 ms (10 allocations: 3.81 MiB)

julia> @btime rand!(\$catdist, \$tmpI);
2.936 ms (8 allocations: 3.05 MiB)
``````
1 Like

Try using the constructor without argument checks:

``````julia> catdist = Categorical(w; check_args=false)
``````

The reason is that `Categorical` is an alias for `DiscreteNonParametric`, and its constructor looks like this at the moment:

``````struct DiscreteNonParametric{T<:Real,P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}} <: DiscreteUnivariateDistribution
support::Ts
p::Ps

function DiscreteNonParametric{T,P,Ts,Ps}(xs::Ts, ps::Ps; check_args::Bool=true) where {
T<:Real,P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}}
check_args || return new{T,P,Ts,Ps}(xs, ps)
@check_args(
DiscreteNonParametric,
(length(xs) == length(ps), "length of support and probability vector must be equal"),
(ps, isprobvec(ps), "vector is not a probability vector"),
(xs, allunique(xs), "support must contain only unique elements"),
)
sort_order = sortperm(xs)
new{T,P,Ts,Ps}(xs[sort_order], ps[sort_order])
end
end
``````

Note the presence of `sortperm(xs)` as well as reindexing with `sort_order`.
I think sorting allows for faster sampling (not sure, cause intuitively I would sort `ps`), but it also means we waste a lot of time during construction as a result.
Strictly speaking, I think the argument checking itself doesn’t allocate, so maybe there should be two keyword arguments: `check_args` and `sort_args`?

3 Likes

Right, in that case the call to

``````rand!(Categorical(w, check_args = false), tmpI)
``````

would have the same runtime and 8 allocations like

``````rand!(cat_dist, tmpI)
``````

above, but the 8 allocations from calling `rand!` one a categorical distribution remain.

2 Likes

Oh right, I had not paid enough attention, and I went straight for the problem I had already encountered. Time to dig some more

1 Like

Hope I am not operating on wrong assumptions here, but there are no allocations when calling `rand(cat_dist)`. At first glance, and maybe naive, I would then expect the `rand!` to perform similarly as the other distributions (e.g., no allocations for the in-place `rand!`):

``````function alloctest()
w = rand(10^5)
Distributions.normalize!(w,1)
c = Categorical(w, check_args=false)
@btime rand(\$c)
end

alloctest()
# 98.000 ns (0 allocations: 0 bytes)

``````

It is obvious that the output of `rand` will 0-allocate. But the `rand!` call seems to perform additional allocations before producing the output for `rand` call.

1 Like

Ok, so I did some more digging. To sample from `d isa Categorical`, the package first creates `s = sampler(d)` which is an `AliasTable`, and then calls `rand(rng, s)` as many times as necessary.

Lo and behold the `AliasTable` implementation:

``````struct AliasTable <: Sampleable{Univariate,Discrete}
accept::Vector{Float64}
alias::Vector{Int}
end
ncategories(s::AliasTable) = length(s.alias)

function AliasTable(probs::AbstractVector)
n = length(probs)
n > 0 || throw(ArgumentError("The input probability vector is empty."))
accp = Vector{Float64}(undef, n)
alias = Vector{Int}(undef, n)
StatsBase.make_alias_table!(probs, 1.0, accp, alias)
AliasTable(accp, alias)
end

function rand(rng::AbstractRNG, s::AliasTable)
i = rand(rng, 1:length(s.alias)) % Int
u = rand(rng)
@inbounds r = u < s.accept[i] ? i : s.alias[i]
r
end
``````

So the sampling itself is allocation-free, but the construction is not. I think what you need to do is amortize it by constructing `s = sampler(d)` at the beginning of a loop and then use the sampler directly as `x[i] = rand(rng, s)` for each component of the target `x`.

2 Likes

The solution:

``````julia> using Distributions, LinearAlgebra

julia> function loop()
w = rand(10^5)
normalize!(w,1)
tmpI = zeros(Int,10^5)
d = Categorical(w, check_args=false)  # this doesn't allocate
s = sampler(d)  # this allocates
@time for i in 1:10^3
for j in eachindex(tmpI)  # can't use rand! directly, sad emoji
tmpI[j] = rand(s)  # this doesn't allocate
end
end
end
loop (generic function with 1 method)

julia> loop()
2.891325 seconds
``````
6 Likes

Thank you all for your help! It was a big trap for me.

1 Like

I propose this version which actually supports the usage of `rand!` directly:

``````using Random, Distributions, LinearAlgebra

function loop()
w = rand(10^5)
normalize!(w, 1)
tmpI = zeros(Int, 10^5)
d = Categorical(w, check_args=false)  # this doesn't allocate
s = sampler(d)
@time for i in 1:10^3
rand!(s, tmpI)
end
end

loop()
# 1.841305 seconds
# this is not faster than the version proposed by @gdalle
# I also get faster times for his version on this machine
``````

This works because (interpret this in the context/scope of the `loop` function):

``````s = sampler(d)
@info d isa Sampleable # true
@info s isa Sampleable # true

#and
sampler(d) # allocates
sampler(s) # does not allocate

#and
@info s === sampler(s) # true
# just to add redundant stuff:
@info sampler(s) === identity(s) # true
``````

And there is a fallback method with the signature: `rand!(rng::AbstractRNG, s::Sampleable{Univariate}, A::AbstractArray{<:Real})` - but hey, `sample` returns a `Sampleable`. You can go into more details here.

So, the `rand!(::Categorical ,...)` method used initially by OP has a fallback on the above `rand!` method I used in the updated `loop` - which is also exported by the API - so safe to use.

Anyway - I hope my explanation is clear enough - if it is not, please run the `loop` function from above first, and we can talk later 2 Likes