Is allocation inevitable when generating random numbers from a categorical distribution?

Hi,

I am trying to do in-place random number generation using Distributions.jl. However, for some reason, only the random number generation from the Categorical distribution takes extra computation time due to the extra allocations. Here is the code.

function loop()
    w = rand(10^5)
    normalize!(w,1)
    tmpI = zeros(Int,10^5)
    tmpF = zeros(Float64,10^5)

    @time for i in 1:10^3
        rand!(Categorical(w), tmpI)
    end

    @time for i in 1:10^3
        rand!(Normal(0,1), tmpF)
    end

    @time for i in 1:10^3
        rand!(Gamma(1,1), tmpF)
    end

    @time for i in 1:10^3
        rand!(InverseGamma(1,1), tmpF)
    end

    @time for i in 1:10^3
        rand!(Uniform(0,1), tmpF)
    end
end

loop()

And here is the result.

  4.153208 seconds (10.00 k allocations: 3.726 GiB, 1.42% gc time)
  0.269260 seconds
  0.444689 seconds
  0.883855 seconds
  0.063641 seconds

Is this inevitable?

Looks like it, although I can save 20% allocations and 10% timing on my machine by pulling out the creation of the distribution object:

julia> catdist = Categorical(w);

julia> @btime rand!(Categorical($w), $tmpI);
  3.279 ms (10 allocations: 3.81 MiB)

julia> @btime rand!($catdist, $tmpI);
  2.936 ms (8 allocations: 3.05 MiB)
1 Like

Try using the constructor without argument checks:

julia> catdist = Categorical(w; check_args=false)

The reason is that Categorical is an alias for DiscreteNonParametric, and its constructor looks like this at the moment:

struct DiscreteNonParametric{T<:Real,P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}} <: DiscreteUnivariateDistribution
    support::Ts
    p::Ps

    function DiscreteNonParametric{T,P,Ts,Ps}(xs::Ts, ps::Ps; check_args::Bool=true) where {
            T<:Real,P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}}
        check_args || return new{T,P,Ts,Ps}(xs, ps)
        @check_args(
            DiscreteNonParametric,
            (length(xs) == length(ps), "length of support and probability vector must be equal"),
            (ps, isprobvec(ps), "vector is not a probability vector"),
            (xs, allunique(xs), "support must contain only unique elements"),
        )
        sort_order = sortperm(xs)
        new{T,P,Ts,Ps}(xs[sort_order], ps[sort_order])
    end
end

Note the presence of sortperm(xs) as well as reindexing with sort_order.
I think sorting allows for faster sampling (not sure, cause intuitively I would sort ps), but it also means we waste a lot of time during construction as a result.
Strictly speaking, I think the argument checking itself doesn’t allocate, so maybe there should be two keyword arguments: check_args and sort_args?

3 Likes

Right, in that case the call to

rand!(Categorical(w, check_args = false), tmpI)

would have the same runtime and 8 allocations like

rand!(cat_dist, tmpI)

above, but the 8 allocations from calling rand! one a categorical distribution remain.

2 Likes

Oh right, I had not paid enough attention, and I went straight for the problem I had already encountered. Time to dig some more

1 Like

Hope I am not operating on wrong assumptions here, but there are no allocations when calling rand(cat_dist). At first glance, and maybe naive, I would then expect the rand! to perform similarly as the other distributions (e.g., no allocations for the in-place rand!):

function alloctest()
    w = rand(10^5)
    Distributions.normalize!(w,1)    
    c = Categorical(w, check_args=false)
    @btime rand($c)
end

alloctest()
# 98.000 ns (0 allocations: 0 bytes)

It is obvious that the output of rand will 0-allocate. But the rand! call seems to perform additional allocations before producing the output for rand call.

1 Like

Ok, so I did some more digging. To sample from d isa Categorical, the package first creates s = sampler(d) which is an AliasTable, and then calls rand(rng, s) as many times as necessary.

Lo and behold the AliasTable implementation:

struct AliasTable <: Sampleable{Univariate,Discrete}
    accept::Vector{Float64}
    alias::Vector{Int}
end
ncategories(s::AliasTable) = length(s.alias)

function AliasTable(probs::AbstractVector)
    n = length(probs)
    n > 0 || throw(ArgumentError("The input probability vector is empty."))
    accp = Vector{Float64}(undef, n)
    alias = Vector{Int}(undef, n)
    StatsBase.make_alias_table!(probs, 1.0, accp, alias)
    AliasTable(accp, alias)
end

function rand(rng::AbstractRNG, s::AliasTable)
    i = rand(rng, 1:length(s.alias)) % Int
    u = rand(rng)
    @inbounds r = u < s.accept[i] ? i : s.alias[i]
    r
end

So the sampling itself is allocation-free, but the construction is not. I think what you need to do is amortize it by constructing s = sampler(d) at the beginning of a loop and then use the sampler directly as x[i] = rand(rng, s) for each component of the target x.

2 Likes

The solution:

julia> using Distributions, LinearAlgebra

julia> function loop()
           w = rand(10^5)
           normalize!(w,1)
           tmpI = zeros(Int,10^5)
           d = Categorical(w, check_args=false)  # this doesn't allocate
           s = sampler(d)  # this allocates
           @time for i in 1:10^3
               for j in eachindex(tmpI)  # can't use rand! directly, sad emoji
                   tmpI[j] = rand(s)  # this doesn't allocate
               end
           end
       end
loop (generic function with 1 method)

julia> loop()
  2.891325 seconds
6 Likes

Thank you all for your help! It was a big trap for me.

1 Like

@gdalle, @taka255,

I propose this version which actually supports the usage of rand! directly:

using Random, Distributions, LinearAlgebra

function loop()
    w = rand(10^5)
    normalize!(w, 1)
    tmpI = zeros(Int, 10^5)
    d = Categorical(w, check_args=false)  # this doesn't allocate
    s = sampler(d)
    @time for i in 1:10^3
        rand!(s, tmpI)
    end
end

loop()
# 1.841305 seconds
# this is not faster than the version proposed by @gdalle
# I also get faster times for his version on this machine

This works because (interpret this in the context/scope of the loop function):

s = sampler(d)
@info d isa Sampleable # true
@info s isa Sampleable # true

#and
sampler(d) # allocates
sampler(s) # does not allocate 

#and
@info s === sampler(s) # true
# just to add redundant stuff:
@info sampler(s) === identity(s) # true

And there is a fallback method with the signature: rand!(rng::AbstractRNG, s::Sampleable{Univariate}, A::AbstractArray{<:Real}) - but hey, sample returns a Sampleable. You can go into more details here.

So, the rand!(::Categorical ,...) method used initially by OP has a fallback on the above rand! method I used in the updated loop - which is also exported by the API - so safe to use.

Anyway - I hope my explanation is clear enough - if it is not, please run the loop function from above first, and we can talk later :slight_smile:

2 Likes