Hello everyone!
I’m writing a code that samples millions of times from a truncated distribution.
Everything works well and is also enough fast, but I am not satisfied with the memory allocations and the gc time, in my opinion too high.
In particular, I’d like to make sure that memory allocations doesn’t depend on the number of iterations - is that possible?
I attach below the piece of code that samples from the truncated distributions. This piece of code by itself makes no sense, but it’s the piece of algorithm that allocates too much memory, so I simplify it in this way so it is easy to understand:
using Distributions
function funza(d, sigma, N)
Distr_sample = [truncated(Normal(μ,sigma),0,d) for μ in collect(0:1:d)]
#=
Example with d=3, sigma=1
4-element Vector{Truncated{Normal{Float64}, Continuous, Float64}}:
Truncated(Normal{Float64}(μ=0.0, σ=1.0), range=(0.0, 3.0))
Truncated(Normal{Float64}(μ=1.0, σ=1.0), range=(0.0, 3.0))
Truncated(Normal{Float64}(μ=2.0, σ=1.0), range=(0.0, 3.0))
Truncated(Normal{Float64}(μ=3.0, σ=1.0), range=(0.0, 3.0))
=#
#Initial draw, conventional starting from 0
draw=zeros(Int8, 20)
prop_draw=zeros(Int8, 20)
for n=1:1:N
for i=1:1:20
prop_draw[i]=round(rand(Distr_sample[draw[i]+1],1)[1] ,digits=0)
end # i cycle
end # n cycle
end
Running the original code with the --track-allocation = user option I see that there are way too many allocations when it comes to sampling.
Using the @time macro I get:
@time funza(4,1,10^3)
0.001423 seconds (20.00 k allocations: 1.832 MiB)
@time funza(4,1,10^4)
0.019120 seconds (200.00 k allocations: 18.311 MiB, 32.47% gc time)
@time funza(4,1,10^5)
0.144889 seconds (2.00 M allocations: 183.106 MiB, 10.65% gc time)
I’m pretty sure there’s a better and smarter way to do this.
Thanks a lot to everyone who will answer to this!
P.S. I also specify (this cannot be deduced from the piece of code I posted) that it is a Markov chain, i.e. at each iteration we must sample from a Gaussian that is centered on a value that depends on the previous iteration. So I can’t call the rand () function once unfortunately (this would be very efficient)