Is allocation inevitable when generating random numbers from a categorical distribution?

taka255 · July 18, 2023, 8:39am

Hi,

I am trying to do in-place random number generation using Distributions.jl. However, for some reason, only the random number generation from the Categorical distribution takes extra computation time due to the extra allocations. Here is the code.

function loop()
    w = rand(10^5)
    normalize!(w,1)
    tmpI = zeros(Int,10^5)
    tmpF = zeros(Float64,10^5)

    @time for i in 1:10^3
        rand!(Categorical(w), tmpI)
    end

    @time for i in 1:10^3
        rand!(Normal(0,1), tmpF)
    end

    @time for i in 1:10^3
        rand!(Gamma(1,1), tmpF)
    end

    @time for i in 1:10^3
        rand!(InverseGamma(1,1), tmpF)
    end

    @time for i in 1:10^3
        rand!(Uniform(0,1), tmpF)
    end
end

loop()

And here is the result.

  4.153208 seconds (10.00 k allocations: 3.726 GiB, 1.42% gc time)
  0.269260 seconds
  0.444689 seconds
  0.883855 seconds
  0.063641 seconds

Is this inevitable?

nilshg · July 18, 2023, 9:27am

Looks like it, although I can save 20% allocations and 10% timing on my machine by pulling out the creation of the distribution object:

julia> catdist = Categorical(w);

julia> @btime rand!(Categorical($w), $tmpI);
  3.279 ms (10 allocations: 3.81 MiB)

julia> @btime rand!($catdist, $tmpI);
  2.936 ms (8 allocations: 3.05 MiB)

gdalle · July 18, 2023, 10:02am

Try using the constructor without argument checks:

julia> catdist = Categorical(w; check_args=false)

The reason is that Categorical is an alias for DiscreteNonParametric, and its constructor looks like this at the moment:

struct DiscreteNonParametric{T<:Real,P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}} <: DiscreteUnivariateDistribution
    support::Ts
    p::Ps

    function DiscreteNonParametric{T,P,Ts,Ps}(xs::Ts, ps::Ps; check_args::Bool=true) where {
            T<:Real,P<:Real,Ts<:AbstractVector{T},Ps<:AbstractVector{P}}
        check_args || return new{T,P,Ts,Ps}(xs, ps)
        @check_args(
            DiscreteNonParametric,
            (length(xs) == length(ps), "length of support and probability vector must be equal"),
            (ps, isprobvec(ps), "vector is not a probability vector"),
            (xs, allunique(xs), "support must contain only unique elements"),
        )
        sort_order = sortperm(xs)
        new{T,P,Ts,Ps}(xs[sort_order], ps[sort_order])
    end
end

Note the presence of sortperm(xs) as well as reindexing with sort_order.
I think sorting allows for faster sampling (not sure, cause intuitively I would sort ps), but it also means we waste a lot of time during construction as a result.
Strictly speaking, I think the argument checking itself doesn’t allocate, so maybe there should be two keyword arguments: check_args and sort_args?

nilshg · July 18, 2023, 10:19am

Right, in that case the call to

rand!(Categorical(w, check_args = false), tmpI)

would have the same runtime and 8 allocations like

rand!(cat_dist, tmpI)

above, but the 8 allocations from calling rand! one a categorical distribution remain.

gdalle · July 18, 2023, 10:25am

Oh right, I had not paid enough attention, and I went straight for the problem I had already encountered. Time to dig some more

algunion · July 18, 2023, 10:31am

Hope I am not operating on wrong assumptions here, but there are no allocations when calling rand(cat_dist). At first glance, and maybe naive, I would then expect the rand! to perform similarly as the other distributions (e.g., no allocations for the in-place rand!):

function alloctest()
    w = rand(10^5)
    Distributions.normalize!(w,1)    
    c = Categorical(w, check_args=false)
    @btime rand($c)
end

alloctest()
# 98.000 ns (0 allocations: 0 bytes)

It is obvious that the output of rand will 0-allocate. But the rand! call seems to perform additional allocations before producing the output for rand call.

gdalle · July 18, 2023, 10:36am

Ok, so I did some more digging. To sample from d isa Categorical, the package first creates s = sampler(d) which is an AliasTable, and then calls rand(rng, s) as many times as necessary.

Lo and behold the AliasTable implementation:

struct AliasTable <: Sampleable{Univariate,Discrete}
    accept::Vector{Float64}
    alias::Vector{Int}
end
ncategories(s::AliasTable) = length(s.alias)

function AliasTable(probs::AbstractVector)
    n = length(probs)
    n > 0 || throw(ArgumentError("The input probability vector is empty."))
    accp = Vector{Float64}(undef, n)
    alias = Vector{Int}(undef, n)
    StatsBase.make_alias_table!(probs, 1.0, accp, alias)
    AliasTable(accp, alias)
end

function rand(rng::AbstractRNG, s::AliasTable)
    i = rand(rng, 1:length(s.alias)) % Int
    u = rand(rng)
    @inbounds r = u < s.accept[i] ? i : s.alias[i]
    r
end

So the sampling itself is allocation-free, but the construction is not. I think what you need to do is amortize it by constructing s = sampler(d) at the beginning of a loop and then use the sampler directly as x[i] = rand(rng, s) for each component of the target x.

gdalle · July 18, 2023, 10:42am

The solution:

julia> using Distributions, LinearAlgebra

julia> function loop()
           w = rand(10^5)
           normalize!(w,1)
           tmpI = zeros(Int,10^5)
           d = Categorical(w, check_args=false)  # this doesn't allocate
           s = sampler(d)  # this allocates
           @time for i in 1:10^3
               for j in eachindex(tmpI)  # can't use rand! directly, sad emoji
                   tmpI[j] = rand(s)  # this doesn't allocate
               end
           end
       end
loop (generic function with 1 method)

julia> loop()
  2.891325 seconds

taka255 · July 18, 2023, 10:54am

Thank you all for your help! It was a big trap for me.

algunion · July 18, 2023, 12:14pm

gdalle:

@time for i in 1:10^3
               for j in eachindex(tmpI)  # can't use rand! directly, sad emoji
                   tmpI[j] = rand(s)  # this doesn't allocate
               end
           end

@gdalle, @taka255,

I propose this version which actually supports the usage of rand! directly:

using Random, Distributions, LinearAlgebra

function loop()
    w = rand(10^5)
    normalize!(w, 1)
    tmpI = zeros(Int, 10^5)
    d = Categorical(w, check_args=false)  # this doesn't allocate
    s = sampler(d)
    @time for i in 1:10^3
        rand!(s, tmpI)
    end
end

loop()
# 1.841305 seconds
# this is not faster than the version proposed by @gdalle
# I also get faster times for his version on this machine

This works because (interpret this in the context/scope of the loop function):

s = sampler(d)
@info d isa Sampleable # true
@info s isa Sampleable # true

#and
sampler(d) # allocates
sampler(s) # does not allocate 

#and
@info s === sampler(s) # true
# just to add redundant stuff:
@info sampler(s) === identity(s) # true

And there is a fallback method with the signature: rand!(rng::AbstractRNG, s::Sampleable{Univariate}, A::AbstractArray{<:Real}) - but hey, sample returns a Sampleable. You can go into more details here.

So, the rand!(::Categorical ,...) method used initially by OP has a fallback on the above rand! method I used in the updated loop - which is also exported by the API - so safe to use.

Anyway - I hope my explanation is clear enough - if it is not, please run the loop function from above first, and we can talk later

Topic		Replies	Views
Sampler for Categorical Rational Distributions General Usage question , package , distributions	3	529	February 11, 2021
Gradient through categorical distribution slow and has many allocations Performance	0	486	July 26, 2020
Product distribution allocates (a lot) New to Julia memory-allocation , distributions , random	9	169	March 11, 2025
Sampling from a probability distribution on GPU GPU statistics , gpu , distributions	13	752	June 13, 2023
Construct a generic discrete distribution in Distributions.jl? Statistics	11	3251	February 23, 2017

Is allocation inevitable when generating random numbers from a categorical distribution?

Related topics