# How to sample from a truncated distribution without allocating too much memory

Hello everyone!

I’m writing a code that samples millions of times from a truncated distribution.
Everything works well and is also enough fast, but I am not satisfied with the memory allocations and the gc time, in my opinion too high.

In particular, I’d like to make sure that memory allocations doesn’t depend on the number of iterations - is that possible?

I attach below the piece of code that samples from the truncated distributions. This piece of code by itself makes no sense, but it’s the piece of algorithm that allocates too much memory, so I simplify it in this way so it is easy to understand:

``````using Distributions

function funza(d, sigma, N)

Distr_sample = [truncated(Normal(μ,sigma),0,d) for μ in collect(0:1:d)]

#=
Example with d=3, sigma=1
4-element Vector{Truncated{Normal{Float64}, Continuous, Float64}}:
Truncated(Normal{Float64}(μ=0.0, σ=1.0), range=(0.0, 3.0))
Truncated(Normal{Float64}(μ=1.0, σ=1.0), range=(0.0, 3.0))
Truncated(Normal{Float64}(μ=2.0, σ=1.0), range=(0.0, 3.0))
Truncated(Normal{Float64}(μ=3.0, σ=1.0), range=(0.0, 3.0))
=#

#Initial draw, conventional starting from 0
draw=zeros(Int8, 20)
prop_draw=zeros(Int8, 20)

for n=1:1:N
for i=1:1:20

prop_draw[i]=round(rand(Distr_sample[draw[i]+1],1)[1] ,digits=0)

end # i cycle
end # n cycle

end
``````

Running the original code with the --track-allocation = user option I see that there are way too many allocations when it comes to sampling.

Using the @time macro I get:

``````@time funza(4,1,10^3)
0.001423 seconds (20.00 k allocations: 1.832 MiB)

@time funza(4,1,10^4)
0.019120 seconds (200.00 k allocations: 18.311 MiB, 32.47% gc time)

@time funza(4,1,10^5)
0.144889 seconds (2.00 M allocations: 183.106 MiB, 10.65% gc time)
``````

I’m pretty sure there’s a better and smarter way to do this.

Thanks a lot to everyone who will answer to this!

P.S. I also specify (this cannot be deduced from the piece of code I posted) that it is a Markov chain, i.e. at each iteration we must sample from a Gaussian that is centered on a value that depends on the previous iteration. So I can’t call the rand () function once unfortunately (this would be very efficient)

Some small and one big thing:

Don’t use `collect`, and use a UnitRange instead of a StepRange, so `0:d` instead of `collect(0:1:d)`.

Prefer `1:N` and `1:20`. These don’t matter for allocations, and probably very little or nothing for speed.

This one is big:

Any call like `rand(distr, 1)[1]` allocates and populates an array and then reads out its first element. That’s bad. Instead, create a random scalar:

``````rand(distr)
``````

Also, your rounding is not good. It creates an integer valued `float` which is then converted when assigned to `prop_draw`. Instead, round directly to the correct type:

``````round(Int8, x)
``````

So try

``````prop_draw[i] = round(Int8, rand(Distr_sample[draw[i]+1]))
``````
3 Likes