Allocations when creating array

Hi! I have the following code trying to simulate 1000 dimensional brownian paths

const type=Float32
function simulate_path(path_length)
    xis=randn(type,(100,path_length-1))
    X=zeros(type,(100,path_length))
    sqdt=sqrt(0.001)
    sigma=sqrt(2)
    for i in 2:path_length
        X[:,i]=@views X[:,i-1]+sqdt*sigma*xis[:,i-1]
    end
    return X,xis
end

function simulate_N_paths(n_samples)
    samples=Array{Tuple{Matrix{type},Matrix{type}}}(undef,n_samples)
    Threads.@threads for i in 1:n_samples
        samples[i]=simulate_path(1000)
    end
    return samples
end

@time resp=simulate_N_paths(1000);

It works, but it is allocating much memory. When I run the last line many times, my memory starts leaking and I don’t know why, because I am overwriting my paths in the resp list. Do someone sees why it does allocate many times when running the function many times? How can improve in general my code? How can I track my allocations?

change the = to .= and you should see a big improvement. Currently an allocation is happening every time this is run.

With @btime from BenchmarkTools you can get a better idea, but for the ultimate way of seeing where they come from run:

using Profile
using PProf

Profile.Allocs.clear()
Profile.Allocs.@profile sample_rate=1 simulate_N_paths(1000)
PProf.Allocs.pprof(from_c = false)

It seems changing = by .= didn’t help much. And I think it is weird because the memory is allocating space every time I run the function and not saving the result in the same space

You’re allocating many temporary arrays in your operations

X[:,i] .= @views X[:,i-1] .+ sqdt .* sigma .* xis[:,i-1]

Please read More Dots: Syntactic Loop Fusion in Julia

1 Like

Oh, I mean if you’re allocating xis and X inside the function simulate_path then they will be re-allocated every time you call the function. To reuse this memory you need to allocate them outside and pass them to the function as an argument. Something like this (but I’m not sure if randn will cooperate)

(code currently broken)

function simulate_path!(X, xis, path_length)
    xis .= randn(type, (100,path_length-1))
    X .= zero(type)
    sqdt=sqrt(0.001)
    sigma=sqrt(2)
    for i in 2:path_length
        X[:,i] .= @views X[:,i-1] .+ sqdt .* sigma .* xis[:,i-1]
    end
    return nothing
end

function simulate_N_paths(n_samples)
    samples=Array{Tuple{Matrix{type},Matrix{type}}}(undef,n_samples)
    Xs = [zeros(type, (100, path_length-1)) for i in 1:n_samples]
    xiss = [zeros(type, (100, path_length)) for i in 1:n_samples]
    Threads.@threads for i in 1:n_samples
        simulate_path!(Xs[i], xiss[i], 1000)
        samples[i] .= Xs[i], xiss[i] 
    end
    return samples
end
1 Like

This can be just

xis .= randn.(type)

And it will not allocate the intermediate array.

1 Like

I figured out why my example doesn’t work. You’re allocating an undefined Array of undefined size, this means when I try to run

simulate_path!(Xs[i], xiss[i], 1000)
samples[i] .= Xs[i], xiss[i] 

it doesn’t know how to do samples[i] .= ... because it doesn’t know the size to write to.

To fix this, and to lower the number of allocations more, I would preallocate samples to the size that is necessary given the output.

I tried it, but It didn’t work and I don’t know why

function simulate_path!(X, xis, path_length)
    xis .= randn(type,(100,path_length-1))
    X .= zeros(type,(100,path_length))
    sqdt=sqrt(0.001)
    sigma=sqrt(2)
    for i in 2:path_length
        X[:,i] .= @views X[:,i-1] .+ sqdt .* sigma .* xis[:,i-1]
    end
    return nothing
end

function simulate_N_paths(n_samples,path_length)
    samples=Array{Tuple{Matrix{type},Matrix{type}}}(undef,n_samples)
    Xs = [zeros(type, (100, path_length)) for i in 1:n_samples]
    xiss = [zeros(type, (100, path_length-1)) for i in 1:n_samples]
    Threads.@threads for i in 1:n_samples
        simulate_path!(Xs[i], xiss[i], 1000)
        samples[i] .= Xs[i], xiss[i] 
    end
    return samples
end

@time resp=simulate_N_paths(1000,1001);

The error is a DimensionMissmatch in simulate_path!(Xs[i], xiss[i], 1000) before samples[i].=. All arrays seems to have the desired dimension.

It may work in this case, but in the future I need different path lengths, so I will not know a priori array sizes

@jacobusmmsmit @lmiq @giordano Look a this suggested by @jacobusmmsmit.

function simulate_path!(X, xis, path_length)
    #xis .=randn.(type)
    sqdt=sqrt(0.001)
    sigma=sqrt(2)
    for i in 2:path_length
        X[:,i] .= @views X[:,i-1] .+ sqdt .* sigma .* xis[:,i-1]
    end
    return nothing
end

function simulate_N_paths(n_samples,path_length)
    samples=[(zeros(type, (100, path_length)),randn(type, (100, path_length-1))) for i in 1:n_samples]
    Threads.@threads for i in 1:n_samples
        simulate_path!(samples[i][1], samples[i][2], 1000)
    end
    return samples
end

for _ in 1:10
    @time resp=simulate_N_paths(1000,1001);
end

It works, but allocations are increasing in every step in the for loop, and I have to trigger manually the garbage collector at the end of the four loop. I want it to allocate outputs in the same space as reps every time. Is it possible?

I don’t see memory allocations in simulate_path!:

julia> using BenchmarkTools

julia> function simulate_path!(X, xis, path_length)
           sqdt = sqrt(0.001)
           sigma = sqrt(2)
           for i in 2:path_length
               X[:,i] .= @views X[:,i-1] .+ sqdt .* sigma .* xis[:,i-1]
           end
           return nothing
       end
simulate_path! (generic function with 1 method)

julia> @benchmark simulate_path!(X, xis, 1000) setup=(type=Float32; path_length=1001; X=zeros(type, (100, path_length)); xis=randn(type, (100, path_length-1))) evals=1
BenchmarkTools.Trial: 7034 samples with 1 evaluation.
 Range (min … max):  49.256 μs … 99.387 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     50.018 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   51.084 μs ±  3.624 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▇▇▅▄▃▂      ▂▂                                            ▁
  █████████▇▇▇▇████▇▇▅▆▄▆▅▅▅▅▅▆▅▆▅▆▆▆▄▆▆▆▆▅▆▇██▇▇▇▆▆▆▅▅▅▆▄▄▄▅ █
  49.3 μs      Histogram: log(frequency) by time      65.7 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Your simulate_N_paths does allocate memory (it literally creates a 1000-element vector of two 400 kiB matrices), but I’m not sure what’s surprising in there.

I think you can avoid the xis matrix completely halving memory allocations with

using Random

function simulate_path!(X::AbstractMatrix{T}, path_length::Integer, rng::AbstractRNG) where {T}
    sqdt = sqrt(0.001)
    sigma = sqrt(2)
    for i in 2:path_length
        @views X[:, i] .= X[:, i-1] .+ sqdt .* sigma .* randn.(rng, T)
    end
    return nothing
end

function simulate_N_paths(n_samples::Integer, path_length::Integer, type=Float32)
    samples = zeros(type, 100, path_length, n_samples)
    rng = Random.default_rng()
    for i in 1:n_samples
        simulate_path!(@view(samples[:, :, i]), 1000, rng)
    end
    return samples
end

but I don’t know how you can avoid creating the samples array, if that’s what you need (I don’t know what you need)

I mean, I know I’m allocating the samples vector, but I need to generate a bunch of such vectors one at a time, but when I am doing it in the for loop memory starts collapsion because the garbage collector do not removes it fast enough, and when I do it manually it slows down my code

I don’t see that, with your code I get:

julia> for _ in 1:10
           @time resp=simulate_N_paths(1000,1001);
       end
  0.686341 seconds (7.04 k allocations: 763.568 MiB)
  0.689205 seconds (7.04 k allocations: 763.568 MiB, 0.38% gc time)
  0.721921 seconds (7.04 k allocations: 763.568 MiB, 2.66% gc time)
  0.881423 seconds (7.04 k allocations: 763.568 MiB)
  0.985052 seconds (7.04 k allocations: 763.568 MiB, 9.72% gc time)
  0.649257 seconds (7.04 k allocations: 763.568 MiB)
  0.687810 seconds (7.04 k allocations: 763.568 MiB, 3.81% gc time)
  0.662495 seconds (7.04 k allocations: 763.568 MiB)
  0.654960 seconds (7.04 k allocations: 763.568 MiB, 0.29% gc time)
  0.663761 seconds (7.04 k allocations: 763.568 MiB)

Now iterate it 100 times and you’ll see it occupies all memory when it shouldn’t

julia> for _ in 1:100
           @time resp=simulate_N_paths(1000,1001);
       end
  0.846531 seconds (144.91 k allocations: 773.162 MiB, 0.81% gc time, 61.57% compilation time)
  0.668829 seconds (7.04 k allocations: 763.568 MiB)
  0.672262 seconds (7.04 k allocations: 763.568 MiB, 1.03% gc time)
  0.683861 seconds (7.04 k allocations: 763.568 MiB, 0.90% gc time)
  0.716899 seconds (7.04 k allocations: 763.568 MiB, 0.56% gc time)
  0.689040 seconds (7.04 k allocations: 763.568 MiB, 0.27% gc time)
  0.675120 seconds (7.04 k allocations: 763.568 MiB)
  0.697855 seconds (7.04 k allocations: 763.568 MiB, 0.32% gc time)
  0.677286 seconds (7.04 k allocations: 763.568 MiB, 0.32% gc time)
  0.669528 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.676805 seconds (7.04 k allocations: 763.568 MiB)
  0.670247 seconds (7.04 k allocations: 763.568 MiB, 0.32% gc time)
  0.664467 seconds (7.04 k allocations: 763.568 MiB, 0.24% gc time)
  0.680086 seconds (7.04 k allocations: 763.568 MiB, 0.27% gc time)
  0.664974 seconds (7.04 k allocations: 763.568 MiB, 0.27% gc time)
  0.662659 seconds (7.04 k allocations: 763.568 MiB)
  0.752201 seconds (7.04 k allocations: 763.568 MiB, 0.30% gc time)
  0.666129 seconds (7.04 k allocations: 763.568 MiB, 0.24% gc time)
  0.669314 seconds (7.04 k allocations: 763.568 MiB, 0.31% gc time)
  0.670403 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.664434 seconds (7.04 k allocations: 763.568 MiB)
  0.676214 seconds (7.04 k allocations: 763.568 MiB, 0.33% gc time)
  0.669163 seconds (7.04 k allocations: 763.568 MiB, 0.25% gc time)
  0.668347 seconds (7.04 k allocations: 763.568 MiB, 0.29% gc time)
  0.664817 seconds (7.04 k allocations: 763.568 MiB)
  0.665767 seconds (7.04 k allocations: 763.568 MiB, 0.31% gc time)
  0.663841 seconds (7.04 k allocations: 763.568 MiB, 0.26% gc time)
  0.664572 seconds (7.04 k allocations: 763.568 MiB, 0.26% gc time)
  0.666120 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.663645 seconds (7.04 k allocations: 763.568 MiB)
  0.674315 seconds (7.04 k allocations: 763.568 MiB, 0.32% gc time)
  0.670455 seconds (7.04 k allocations: 763.568 MiB, 0.27% gc time)
  0.675228 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.673772 seconds (7.04 k allocations: 763.568 MiB)
  0.680937 seconds (7.04 k allocations: 763.568 MiB, 0.32% gc time)
  0.666342 seconds (7.04 k allocations: 763.568 MiB, 0.23% gc time)
  0.665388 seconds (7.04 k allocations: 763.568 MiB, 0.31% gc time)
  0.665310 seconds (7.04 k allocations: 763.568 MiB, 0.29% gc time)
  0.664449 seconds (7.04 k allocations: 763.568 MiB)
  0.665434 seconds (7.04 k allocations: 763.568 MiB, 0.36% gc time)
  0.666515 seconds (7.04 k allocations: 763.568 MiB, 0.31% gc time)
  0.667780 seconds (7.04 k allocations: 763.568 MiB, 0.30% gc time)
  0.688706 seconds (7.04 k allocations: 763.568 MiB, 0.29% gc time)
  0.687049 seconds (7.04 k allocations: 763.568 MiB)
  0.670405 seconds (7.04 k allocations: 763.568 MiB, 0.32% gc time)
  0.669151 seconds (7.04 k allocations: 763.568 MiB, 0.25% gc time)
  0.666605 seconds (7.04 k allocations: 763.568 MiB, 0.30% gc time)
  0.663498 seconds (7.04 k allocations: 763.568 MiB)
  0.668863 seconds (7.04 k allocations: 763.568 MiB, 0.34% gc time)
  0.662439 seconds (7.04 k allocations: 763.568 MiB, 0.25% gc time)
  0.668099 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.664490 seconds (7.04 k allocations: 763.568 MiB, 0.31% gc time)
  0.672282 seconds (7.04 k allocations: 763.568 MiB)
  0.665388 seconds (7.04 k allocations: 763.568 MiB, 0.33% gc time)
  0.668240 seconds (7.04 k allocations: 763.568 MiB, 0.26% gc time)
  0.667909 seconds (7.04 k allocations: 763.568 MiB, 0.27% gc time)
  0.661000 seconds (7.04 k allocations: 763.568 MiB)
  0.666812 seconds (7.04 k allocations: 763.568 MiB, 0.34% gc time)
  0.673761 seconds (7.04 k allocations: 763.568 MiB, 0.23% gc time)
  0.664866 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.663911 seconds (7.04 k allocations: 763.568 MiB, 0.27% gc time)
  0.661808 seconds (7.04 k allocations: 763.568 MiB)
  0.668902 seconds (7.04 k allocations: 763.568 MiB, 0.33% gc time)
  0.666597 seconds (7.04 k allocations: 763.568 MiB, 0.25% gc time)
  0.669327 seconds (7.04 k allocations: 763.568 MiB, 0.30% gc time)
  0.665637 seconds (7.04 k allocations: 763.568 MiB, 0.27% gc time)
  0.676440 seconds (7.04 k allocations: 763.568 MiB)
  0.683051 seconds (7.04 k allocations: 763.568 MiB, 0.34% gc time)
  0.668212 seconds (7.04 k allocations: 763.568 MiB, 0.25% gc time)
  0.672266 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.673962 seconds (7.04 k allocations: 763.568 MiB)
  0.672494 seconds (7.04 k allocations: 763.568 MiB, 0.30% gc time)
  0.676360 seconds (7.04 k allocations: 763.568 MiB, 0.27% gc time)
  0.671793 seconds (7.04 k allocations: 763.568 MiB, 0.27% gc time)
  0.665337 seconds (7.04 k allocations: 763.568 MiB, 0.30% gc time)
  0.663632 seconds (7.04 k allocations: 763.568 MiB)
  0.669501 seconds (7.04 k allocations: 763.568 MiB, 0.37% gc time)
  0.677144 seconds (7.04 k allocations: 763.568 MiB, 0.26% gc time)
  0.667798 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.662609 seconds (7.04 k allocations: 763.568 MiB)
  0.671711 seconds (7.04 k allocations: 763.568 MiB, 0.35% gc time)
  0.666769 seconds (7.04 k allocations: 763.568 MiB, 0.25% gc time)
  0.671648 seconds (7.04 k allocations: 763.568 MiB, 0.29% gc time)
  0.670277 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.667317 seconds (7.04 k allocations: 763.568 MiB)
  0.664070 seconds (7.04 k allocations: 763.568 MiB, 0.36% gc time)
  0.666664 seconds (7.04 k allocations: 763.568 MiB, 0.25% gc time)
  0.676183 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.668325 seconds (7.04 k allocations: 763.568 MiB, 0.28% gc time)
  0.666195 seconds (7.04 k allocations: 763.568 MiB)
  0.671963 seconds (7.04 k allocations: 763.568 MiB, 0.36% gc time)
  0.665126 seconds (7.04 k allocations: 763.568 MiB, 0.25% gc time)
  0.662345 seconds (7.04 k allocations: 763.568 MiB, 0.30% gc time)
  0.665396 seconds (7.04 k allocations: 763.568 MiB)
  0.668739 seconds (7.04 k allocations: 763.568 MiB, 0.35% gc time)
  0.667000 seconds (7.04 k allocations: 763.568 MiB, 0.26% gc time)
  0.665092 seconds (7.04 k allocations: 763.568 MiB, 0.30% gc time)
  0.663765 seconds (7.04 k allocations: 763.568 MiB, 0.30% gc time)
  0.664604 seconds (7.04 k allocations: 763.568 MiB)
  0.664850 seconds (7.04 k allocations: 763.568 MiB, 0.33% gc time)

Looking at a system monitor, memory usage seems to be pretty stable throughout the run.

Please say which OS both of you are running on.
Was there a thread recently on Windows garbage collection?