How to sample from a truncated distribution without allocating too much memory

Frisus95 · August 11, 2021, 9:40pm

Hello everyone!

I’m writing a code that samples millions of times from a truncated distribution.
Everything works well and is also enough fast, but I am not satisfied with the memory allocations and the gc time, in my opinion too high.

In particular, I’d like to make sure that memory allocations doesn’t depend on the number of iterations - is that possible?

I attach below the piece of code that samples from the truncated distributions. This piece of code by itself makes no sense, but it’s the piece of algorithm that allocates too much memory, so I simplify it in this way so it is easy to understand:

using Distributions

function funza(d, sigma, N)
   
Distr_sample = [truncated(Normal(μ,sigma),0,d) for μ in collect(0:1:d)] 
    
#=
Example with d=3, sigma=1
4-element Vector{Truncated{Normal{Float64}, Continuous, Float64}}:
 Truncated(Normal{Float64}(μ=0.0, σ=1.0), range=(0.0, 3.0))
 Truncated(Normal{Float64}(μ=1.0, σ=1.0), range=(0.0, 3.0))
 Truncated(Normal{Float64}(μ=2.0, σ=1.0), range=(0.0, 3.0))
 Truncated(Normal{Float64}(μ=3.0, σ=1.0), range=(0.0, 3.0))
=#   
    
#Initial draw, conventional starting from 0
draw=zeros(Int8, 20)    
prop_draw=zeros(Int8, 20) 
 
for n=1:1:N    
 for i=1:1:20    
            
 prop_draw[i]=round(rand(Distr_sample[draw[i]+1],1)[1] ,digits=0) 
            
 end # i cycle  
end # n cycle        
    
end

Running the original code with the --track-allocation = user option I see that there are way too many allocations when it comes to sampling.

Using the @time macro I get:

@time funza(4,1,10^3)
0.001423 seconds (20.00 k allocations: 1.832 MiB)

@time funza(4,1,10^4)
  0.019120 seconds (200.00 k allocations: 18.311 MiB, 32.47% gc time)

@time funza(4,1,10^5)
  0.144889 seconds (2.00 M allocations: 183.106 MiB, 10.65% gc time)

I’m pretty sure there’s a better and smarter way to do this.

Thanks a lot to everyone who will answer to this!

P.S. I also specify (this cannot be deduced from the piece of code I posted) that it is a Markov chain, i.e. at each iteration we must sample from a Gaussian that is centered on a value that depends on the previous iteration. So I can’t call the rand () function once unfortunately (this would be very efficient)

DNF · August 11, 2021, 10:12pm

Some small and one big thing:

Don’t use collect, and use a UnitRange instead of a StepRange, so 0:d instead of collect(0:1:d).

Prefer 1:N and 1:20. These don’t matter for allocations, and probably very little or nothing for speed.

This one is big:

Any call like rand(distr, 1)[1] allocates and populates an array and then reads out its first element. That’s bad. Instead, create a random scalar:

rand(distr)

Also, your rounding is not good. It creates an integer valued float which is then converted when assigned to prop_draw. Instead, round directly to the correct type:

round(Int8, x)

So try

prop_draw[i] = round(Int8, rand(Distr_sample[draw[i]+1]))

Topic		Replies	Views
Truncated distributions in Gen Probabilistic Programming question	5	936	June 23, 2023
Sampling from a normal distribution Statistics	4	17010	April 1, 2020
How to sample from a truncated distribution given a certain known variance? Statistics	2	1464	June 9, 2021
Product distribution allocates (a lot) New to Julia memory-allocation , distributions , random	9	162	March 11, 2025
Is allocation inevitable when generating random numbers from a categorical distribution? General Usage question	9	499	July 18, 2023

How to sample from a truncated distribution without allocating too much memory

Related topics