How best to preallocate memory for repeated use

Hello,

I’m trying to reduce memory allocations. The pattern is simple: I need to redo a similar problem that requires a fair bit of memory many times. So I’m preallocating sufficient memory for the largest problem and then reshaping that.

A minimal working example is below. Raw is the computation while reallocating, the others are attempts at making this run more efficiently. All improve things by about a factor of ten.

My question is: which of these methods is preferable (+why) and how can I improve on what’s here? Single-threaded benchmarks are at the bottom.

using Bumper, ManualMemory, BenchmarkTools

# Haven't tried ManualMemory yet, because there's no docs yet


function Compute💩( X )
    X .= 3.4
    return sum( X )
end




function Raw( R, C, N )
    Σ = 0.0
    for i ∈ 1:N
        X = zeros( R + i, C + i )
        Σ += Compute💩( X ) 
    end
    return Σ
end


function Reinterpret( R, C, N )
    M = zeros( (R+N ) * ( C + N ) )
    Σ = 0.0
    for i ∈ 1:N
@views        X = reshape( M[1:(R+i)*(C+i)], R+i, C+i )
        Σ += Compute💩( X ) 
    end
    return Σ
end



function Bump( R, C, N )
    Σ = @no_escape begin
        M = @alloc( Float64,  (R+N ) * ( C + N )  )
        Σ = 0.0
        for i ∈ 1:N
@views       X = reshape( M[1:(R+i)*(C+i)], R+i, C+i )
            Σ += Compute💩( X ) 
        end
        Σ
    end
    return Σ
end



function Wrap( R, C, N )
    M = pointer( zeros( (R+N) * (C+N) ) )
    Σ = 0.0
    GC.@preserve M begin
        for i ∈ 1:N
            X = Base.unsafe_wrap(Matrix{Float64}, M, (R+i, C+i))
            Σ += Compute💩( X )
        end
    end
    return Σ
end
julia> @benchmark Raw( 1000, 1000, 100 )
BenchmarkTools.Trial: 18 samples with 1 evaluation per sample.
 Range (min … max):  238.456 ms … 321.295 ms  ┊ GC (min … max): 12.18% … 18.40%
 Time  (median):     289.997 ms               ┊ GC (median):    17.93%
 Time  (mean ± σ):   287.377 ms ±  22.403 ms  ┊ GC (mean ± σ):  17.74% ±  1.46%

                                       █                      ▃  
  ▇▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▇▇▁▁▁▁▁▁▁▇▇▁█▁▇▁▁▇▇▇▁▁▁▁▇▁▇▁▁▁▁▁▁▁▁█ ▁
  238 ms           Histogram: frequency by time          321 ms <

 Memory estimate: 842.59 MiB, allocs estimate: 300.

julia> @benchmark Reinterpret( 1000, 1000, 100 )
BenchmarkTools.Trial: 185 samples with 1 evaluation per sample.
 Range (min … max):  22.954 ms … 31.429 ms  ┊ GC (min … max): 0.00% … 6.65%
 Time  (median):     27.149 ms              ┊ GC (median):    1.38%
 Time  (mean ± σ):   27.077 ms ±  1.685 ms  ┊ GC (mean ± σ):  2.97% ± 3.52%

                         ▄▂▂ ▁▂▃▃▂      ▃█                     
  ▄▃▄▁▁█▆▄▃▅▅▄▁▁▁▁▃▁▃▁▁▁▁██████████▄▄▆▆▇██▆▆▅▃▅▄▄▁▃▃▃▃▃▁▁▃▁▁▃ ▃
  23 ms           Histogram: frequency by time        31.1 ms <

 Memory estimate: 9.23 MiB, allocs estimate: 3.

julia> @benchmark Bump( 1000, 1000, 100 )
BenchmarkTools.Trial: 193 samples with 1 evaluation per sample.
 Range (min … max):  25.215 ms …  31.155 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     25.598 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.938 ms ± 746.378 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▄█▄▃▂                                                       
  ▃▃█████▆▅▅▃▃▃▃▁▃▃▄▅▃▄▄▄▃▅▆▆▄▄▄▃▃▃▃▁▁▁▃▁▁▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▃ ▃
  25.2 ms         Histogram: frequency by time         28.4 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Wrap( 1000, 1000, 100 )
BenchmarkTools.Trial: 190 samples with 1 evaluation per sample.
 Range (min … max):  21.764 ms … 35.432 ms  ┊ GC (min … max): 0.00% … 0.95%
 Time  (median):     26.827 ms              ┊ GC (median):    1.46%
 Time  (mean ± σ):   26.329 ms ±  2.484 ms  ┊ GC (mean ± σ):  3.08% ± 3.80%

               ▁█           ▃▆▄▃                               
  ▄▆▄▃▄▃▆▄▄▁▃▅███▄▄▆▄▄▄▄▄▃▄▄█████▄▄▁▄▃▄▁█▄▄▄▆▃▄▁▃▄▃▁▁▃▁▁▁▁▁▁▃ ▃
  21.8 ms         Histogram: frequency by time        33.1 ms <

 Memory estimate: 9.24 MiB, allocs estimate: 302.
2 Likes