Undesired memory allocation during interpolation

Gravlax · December 7, 2021, 3:00pm

Dear all,

I am writing a script that will require many interpolations in a row (10000s).
I am trying to make it such that no allocation is made during the successive calls to Itp1D (). Therefore I’ve allocated all required arrays for interpolation initially. However, it stills allocates significant amounts of memory. Would anyone have hints on how to reduce the amount of allocations?

Many thanks in advance!

using Plots#, MAT

@views function Itp1D( Xc, Pc, iW, wW, Xdata, Pdata, dP, Pmin )
    # Interpolate in 1D:              X_W -----------o---x_E
    iW .= Int64.(round.(trunc.( (Pc .-  Pmin)./dP ) .+ 1))   # index of the west node is database
    wW .= 1.0 .- (Pc - Pdata[iW])./dP
    Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW.+1]
end

@views function main()
    # 1. Generate data in 1D
    Pf_data  = LinRange(-0.5, 0.5, 1000)    # data axis
    X_data   = exp.(-Pf_data.^2. / 0.05.^2) # data
    dP       = Pf_data[2] - Pf_data[1]      # data spacing info
    Pf_min   = Pf_data[1]                   # data info
    # 2. Generate a 2D space
    ncx, ncy = 1000, 1000
    xc       = LinRange(-1, 1, ncx)
    yc       = LinRange(-1, 1, ncy)
    xc2d     = repeat(xc, 1, length(yc))
    yc2d     = repeat(yc, 1, length(xc))'
    # 3. Some pre-allocated arrays
    wW       = zeros(ncx,ncy)
    Xc       = zeros(ncx,ncy)
    iW       = zeros(Int64,ncx,ncy) # Achtung: has to be Int since it's an index
    Pfc      = yc2d./4.0
    # 4. Interpolate many times
    @time for i=1:40
        @views Itp1D( Xc, Pfc, iW, wW, X_data, Pf_data, dP, Pf_min )
    end
    # 5. Viz
    # p = plot( Pf_data, X_data )
    # p = plot!(Pfc[:], Xc[:], seriestype = :scatter)
    # display(p)
end

main()
main()

baggepinnen · December 7, 2021, 4:14pm

Are you aware of the @. macro? It might simplify your code. Also have a look at trunc(Int, x) to get rid of round and the call to the extra int conversion.

baggepinnen · December 7, 2021, 4:15pm

If these are arrays, you have missed a dot here.

Gravlax · December 7, 2021, 4:50pm

Oh thanks, initially
1.006806 seconds (160 allocations: 610.358 MiB, 6.79% gc time)
with the dot:
0.867322 seconds (80 allocations: 305.179 MiB, 3.14% gc time)
half is now gone

The remainder allocation is caused by the line:

Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW.+1]

where the underlying broadcasting in Xdata[iW] seems to require additional allocations.
Replacing this line by a loop version:

for i=1:length(Xc)
     Xc[i] = wW[i] * Xdata[iW[i]] +  (1.0 - wW[i]) * Xdata[iW[i] + 1]
end

removes all additional memory allocations and yields:
0.542172 seconds

baggepinnen · December 8, 2021, 9:21am

Yeah nice spot, this expression creates a temporary array.

If the performance of this function is super critical, you may try your luck with GitHub - JuliaSIMD/LoopVectorization.jl: Macro(s) for vectorizing loops.
it can often speed things up a bit

Gravlax · December 8, 2021, 2:21pm

Thanks a lot for the hints. Yes, I’ll very like use LoopVectorization. First, I wanted to nail down the issues with memory allocation. It already makes a big difference for he main code since this function is called ~10000 times in a row

DNF · December 8, 2021, 2:33pm

Why not

Xc .= @views wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW.+1]

?

On the other hand, the @views macro does nothing here

Since there are no indexing expressions on that line.

Gravlax · December 8, 2021, 2:46pm

To get rid of the loop, I would have loved to: Xc .= @views wW .* Xdata[iW] .+ (1.0 .- wW) .* Xdata[iW.+1] but this still allocates memory somehow.
And yes, I have to remove @views from this place. At some point, I’ve just introduced @views everywhere since I could not figure out what vectorised expression was allocating memory…

DNF · December 8, 2021, 2:53pm

Are you sure that’s not a benchmarking artefact? Are you using BenchmarkTools.jl?

baggepinnen · December 8, 2021, 2:55pm

It does not matter if you create a view here, the array iW.+1 is still being allocated.

DNF · December 8, 2021, 2:58pm

Ah, yes, missed that one. Broadcasting getindex, then?

baggepinnen · December 8, 2021, 3:01pm

Yeah I guess that would actually work

roflmaostc · December 8, 2021, 3:07pm

You can benchmark Itp1D like this

# change in main
@btime Itp1D($Xc, $Pfc, $iW, $wW, $X_data, $Pf_data, $dP, $Pf_min)

# julia> main()
#  9.839 ms (2 allocations: 7.63 MiB)

Those 7.63MiB (array of size (1000, 1000) with Float64 has exactly that size) are the only allocation which come from

Xdata[iW .+ 1]

Instead of adding .+ 1 to iW which creates an additional array, we can shift the underlying array allocation free with ShiftedArrays.circshift.

Version with ShiftedArrays.jl (should be correct?)

using BenchmarkTools
using ShiftedArrays

@views function Itp1D( Xc, Pc, iW, wW, Xdata, Pdata, dP, Pmin )
    # Interpolate in 1D:              X_W -----------o---x_E
    iW .= Int64.(round.(trunc.( (Pc .-  Pmin)./dP ) .+ 1))   # index of the west node is database
    wW .= 1.0 .- (Pc .- Pdata[iW])./dP
    # Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW .+ 1]
    Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* ShiftedArrays.circshift(Xdata, -1)[iW]
end

@views function main()
    # 1. Generate data in 1D
    Pf_data  = LinRange(-0.5, 0.5, 1000)    # data axis
    X_data   = exp.(-Pf_data.^2. / 0.05.^2) # data
    dP       = Pf_data[2] - Pf_data[1]      # data spacing info
    Pf_min   = Pf_data[1]                   # data info
    # 2. Generate a 2D space
    ncx, ncy = 1000, 1000
    xc       = LinRange(-1, 1, ncx)
    yc       = LinRange(-1, 1, ncy)
    xc2d     = repeat(xc, 1, length(yc))
    yc2d     = repeat(yc, 1, length(xc))'
    # 3. Some pre-allocated arrays
    wW       = zeros(ncx,ncy)
    Xc       = zeros(ncx,ncy)
    iW       = zeros(Int64,ncx,ncy) # Achtung: has to be Int since it's an index
    Pfc      = yc2d./4.0
    # 4. Interpolate many times
 

    @btime Itp1D($Xc, $Pfc, $iW, $wW, $X_data, $Pf_data, $dP, $Pf_min)
    # @time for i=1:40
    #     @views 
    # end
    # 5. Viz
    # p = plot( Pf_data, X_data )
    # p = plot!(Pfc[:], Xc[:], seriestype = :scatter)
    # display(p)
    # return nothing
end

Original:

julia> main()
  9.835 ms (2 allocations: 7.63 MiB)
1000×1000 Matrix{Float64}:
 1.39375e-11  1.54067e-11  1.70194e-11  1.88059e-11  2.07661e-11  2.29367e-11  …  2.07661e-11  1.88059e-11  1.70194e-11  1.54067e-11  1.39375e-11

With ShiftedArrays.circshift:

julia> main()
  8.972 ms (0 allocations: 0 bytes)
1000×1000 Matrix{Float64}:
 1.39375e-11  1.54067e-11  1.70194e-11  1.88059e-11  2.07661e-11  2.29367e-11  …  2.07661e-11  1.88059e-11  1.70194e-11  1.54067e-11  1.39375e-11

roflmaostc · December 8, 2021, 3:10pm

Oh, I missed the suggestion with getindex.

Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* getindex.(Ref(Xdata), iW .+ 1)

results in:

julia> main()
  8.431 ms (1 allocation: 16 bytes)
1000×1000 Matrix{Float64}:
 1.39375e-11  1.54067e-11  1.70194e-11  1.88059e-11  2.07661e-11  2.29367e-11  …  2.07661e-11  1.88059e-11  1.70194e-11  1.54067e-11  1.39375e-11

DNF · December 8, 2021, 3:11pm

I wonder if it would be useful to have a macro like @., but which also changes X[v] into getindex.(Ref(X), v). Would that work? This seems like a common pattern to run into.

DNF · December 8, 2021, 3:39pm

If you broadcast the first indexing expression too, the last allocation should go away.

roflmaostc · December 8, 2021, 3:50pm

Not quite sure? We are talking about 16 Byte?

In fact, when I apply your suggestion

Xc .= wW .* getindex.(Ref(Xdata), iW) .+  (1.0 .- wW) .* getindex.(Ref(Xdata), iW .+ 1)

It gets even worse:

julia> main() # two times Ref
  8.802 ms (2 allocations: 32 bytes)

julia> main() # only Ref inserted where needed
  8.316 ms (1 allocation: 16 bytes)

I guess the reason is, that Ref causes an allocation? That’s why we see a 16 Byte allocation in my version with getindex. where we use 1 Ref statement.

Topic		Replies	Views
Common allocation mistakes Performance memory-allocation	47	7161	August 21, 2023
Problem with huge number of memory allocations in for loops New to Julia question , performance , memory-allocation	11	1228	February 21, 2021
Too many allocations in functions call General Usage	6	222	January 25, 2025
Removing undesirable allocations in some functions Performance performance , memory-allocation	15	422	January 27, 2024
Allocation of new variables vs modification of existing variables New to Julia	11	2285	January 12, 2017

Undesired memory allocation during interpolation

Related topics