Undesired memory allocation during interpolation

Dear all,

I am writing a script that will require many interpolations in a row (10000s).
I am trying to make it such that no allocation is made during the successive calls to Itp1D (). Therefore I’ve allocated all required arrays for interpolation initially. However, it stills allocates significant amounts of memory. Would anyone have hints on how to reduce the amount of allocations?

Many thanks in advance!

using Plots#, MAT

@views function Itp1D( Xc, Pc, iW, wW, Xdata, Pdata, dP, Pmin )
    # Interpolate in 1D:              X_W -----------o---x_E
    iW .= Int64.(round.(trunc.( (Pc .-  Pmin)./dP ) .+ 1))   # index of the west node is database
    wW .= 1.0 .- (Pc - Pdata[iW])./dP
    Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW.+1]
end

@views function main()
    # 1. Generate data in 1D
    Pf_data  = LinRange(-0.5, 0.5, 1000)    # data axis
    X_data   = exp.(-Pf_data.^2. / 0.05.^2) # data
    dP       = Pf_data[2] - Pf_data[1]      # data spacing info
    Pf_min   = Pf_data[1]                   # data info
    # 2. Generate a 2D space
    ncx, ncy = 1000, 1000
    xc       = LinRange(-1, 1, ncx)
    yc       = LinRange(-1, 1, ncy)
    xc2d     = repeat(xc, 1, length(yc))
    yc2d     = repeat(yc, 1, length(xc))'
    # 3. Some pre-allocated arrays
    wW       = zeros(ncx,ncy)
    Xc       = zeros(ncx,ncy)
    iW       = zeros(Int64,ncx,ncy) # Achtung: has to be Int since it's an index
    Pfc      = yc2d./4.0
    # 4. Interpolate many times
    @time for i=1:40
        @views Itp1D( Xc, Pfc, iW, wW, X_data, Pf_data, dP, Pf_min )
    end
    # 5. Viz
    # p = plot( Pf_data, X_data )
    # p = plot!(Pfc[:], Xc[:], seriestype = :scatter)
    # display(p)
end

main()
main()

Are you aware of the @. macro? It might simplify your code. Also have a look at trunc(Int, x) to get rid of round and the call to the extra int conversion.

If these are arrays, you have missed a dot here.

1 Like

Oh thanks, initially
1.006806 seconds (160 allocations: 610.358 MiB, 6.79% gc time)
with the dot:
0.867322 seconds (80 allocations: 305.179 MiB, 3.14% gc time)
half is now gone :smiley:

The remainder allocation is caused by the line:

Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW.+1]

where the underlying broadcasting in Xdata[iW] seems to require additional allocations.
Replacing this line by a loop version:

for i=1:length(Xc)
     Xc[i] = wW[i] * Xdata[iW[i]] +  (1.0 - wW[i]) * Xdata[iW[i] + 1]
end

removes all additional memory allocations and yields:
0.542172 seconds

Yeah nice spot, this expression creates a temporary array.

If the performance of this function is super critical, you may try your luck with GitHub - JuliaSIMD/LoopVectorization.jl: Macro(s) for vectorizing loops.
it can often speed things up a bit

Thanks a lot for the hints. Yes, I’ll very like use LoopVectorization. First, I wanted to nail down the issues with memory allocation. It already makes a big difference for he main code since this function is called ~10000 times in a row :smiley:

Why not

Xc .= @views wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW.+1]

?

On the other hand, the @views macro does nothing here

Since there are no indexing expressions on that line.

1 Like

To get rid of the loop, I would have loved to: Xc .= @views wW .* Xdata[iW] .+ (1.0 .- wW) .* Xdata[iW.+1] but this still allocates memory somehow.
And yes, I have to remove @views from this place. At some point, I’ve just introduced @views everywhere since I could not figure out what vectorised expression was allocating memory… :smiley:

Are you sure that’s not a benchmarking artefact? Are you using BenchmarkTools.jl?

It does not matter if you create a view here, the array iW.+1 is still being allocated.

Ah, yes, missed that one. Broadcasting getindex, then?

Yeah I guess that would actually work :+1:

You can benchmark Itp1D like this

# change in main
@btime Itp1D($Xc, $Pfc, $iW, $wW, $X_data, $Pf_data, $dP, $Pf_min)

# julia> main()
#  9.839 ms (2 allocations: 7.63 MiB)

Those 7.63MiB (array of size (1000, 1000) with Float64 has exactly that size) are the only allocation which come from

Xdata[iW .+ 1]

Instead of adding .+ 1 to iW which creates an additional array, we can shift the underlying array allocation free with ShiftedArrays.circshift.

Version with ShiftedArrays.jl (should be correct?)
using BenchmarkTools
using ShiftedArrays

@views function Itp1D( Xc, Pc, iW, wW, Xdata, Pdata, dP, Pmin )
    # Interpolate in 1D:              X_W -----------o---x_E
    iW .= Int64.(round.(trunc.( (Pc .-  Pmin)./dP ) .+ 1))   # index of the west node is database
    wW .= 1.0 .- (Pc .- Pdata[iW])./dP
    # Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW .+ 1]
    Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* ShiftedArrays.circshift(Xdata, -1)[iW]
end

@views function main()
    # 1. Generate data in 1D
    Pf_data  = LinRange(-0.5, 0.5, 1000)    # data axis
    X_data   = exp.(-Pf_data.^2. / 0.05.^2) # data
    dP       = Pf_data[2] - Pf_data[1]      # data spacing info
    Pf_min   = Pf_data[1]                   # data info
    # 2. Generate a 2D space
    ncx, ncy = 1000, 1000
    xc       = LinRange(-1, 1, ncx)
    yc       = LinRange(-1, 1, ncy)
    xc2d     = repeat(xc, 1, length(yc))
    yc2d     = repeat(yc, 1, length(xc))'
    # 3. Some pre-allocated arrays
    wW       = zeros(ncx,ncy)
    Xc       = zeros(ncx,ncy)
    iW       = zeros(Int64,ncx,ncy) # Achtung: has to be Int since it's an index
    Pfc      = yc2d./4.0
    # 4. Interpolate many times
 

    @btime Itp1D($Xc, $Pfc, $iW, $wW, $X_data, $Pf_data, $dP, $Pf_min)
    # @time for i=1:40
    #     @views 
    # end
    # 5. Viz
    # p = plot( Pf_data, X_data )
    # p = plot!(Pfc[:], Xc[:], seriestype = :scatter)
    # display(p)
    # return nothing
end

Original:

julia> main()
  9.835 ms (2 allocations: 7.63 MiB)
1000Γ—1000 Matrix{Float64}:
 1.39375e-11  1.54067e-11  1.70194e-11  1.88059e-11  2.07661e-11  2.29367e-11  …  2.07661e-11  1.88059e-11  1.70194e-11  1.54067e-11  1.39375e-11

With ShiftedArrays.circshift:

julia> main()
  8.972 ms (0 allocations: 0 bytes)
1000Γ—1000 Matrix{Float64}:
 1.39375e-11  1.54067e-11  1.70194e-11  1.88059e-11  2.07661e-11  2.29367e-11  …  2.07661e-11  1.88059e-11  1.70194e-11  1.54067e-11  1.39375e-11
2 Likes

Oh, I missed the suggestion with getindex.

Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* getindex.(Ref(Xdata), iW .+ 1)

results in:

julia> main()
  8.431 ms (1 allocation: 16 bytes)
1000Γ—1000 Matrix{Float64}:
 1.39375e-11  1.54067e-11  1.70194e-11  1.88059e-11  2.07661e-11  2.29367e-11  …  2.07661e-11  1.88059e-11  1.70194e-11  1.54067e-11  1.39375e-11

I wonder if it would be useful to have a macro like @., but which also changes X[v] into getindex.(Ref(X), v). Would that work? This seems like a common pattern to run into.

1 Like

If you broadcast the first indexing expression too, the last allocation should go away.

Not quite sure? We are talking about 16 Byte?

In fact, when I apply your suggestion

Xc .= wW .* getindex.(Ref(Xdata), iW) .+  (1.0 .- wW) .* getindex.(Ref(Xdata), iW .+ 1)

It gets even worse:

julia> main() # two times Ref
  8.802 ms (2 allocations: 32 bytes)

julia> main() # only Ref inserted where needed
  8.316 ms (1 allocation: 16 bytes)

I guess the reason is, that Ref causes an allocation? That’s why we see a 16 Byte allocation in my version with getindex. where we use 1 Ref statement.

See also:

julia> x = [1,2];

julia> f(x) = Ref(x)
f (generic function with 2 methods)

julia> @btime f($x)
  7.632 ns (1 allocation: 16 bytes)
Base.RefValue{Vector{Int64}}([1, 2])

julia> f(x) = x
f (generic function with 2 methods)

julia> @btime f($x)
  1.305 ns (0 allocations: 0 bytes)
2-element Vector{Int64}:
 1
 2

That doesn’t add up. If Ref allocates (which is surprising), then there should be two allocations in both cases, since Xdata[iW] definitely allocates.

No Xdata[iW] does not allocate (it would be definitely bigger than 16 Byte!) because of the @views macro in front of the function.

Ref allocates because it essentially creates a struct which is very similar to array but without a few extra information

From julia-1.7.0/share/julia/base/refvalue.jl

mutable struct RefValue{T} <: Ref{T}
    x::T
    RefValue{T}() where {T} = new()
    RefValue{T}(x) where {T} = new(x)
end
RefValue(x::T) where {T} = RefValue{T}(x)

See also:

julia> y = rand(1:10, 1024,);

julia> x = randn((1024, )); z = copy(x);

julia> @views function f(x,y,z)
           z .= x[y]
       end
f (generic function with 3 methods)

julia> @btime f($x, $y, $z);
  949.625 ns (0 allocations: 0 bytes)

# whereas
julia> function g(x,y,z)
           z .= x[y]
       end
g (generic function with 2 methods)

julia> @btime g($x, $y, $z);
  1.156 ΞΌs (1 allocation: 8.12 KiB)

julia> varinfo(r"x")
  name      size summary                     
  –––– ––––––––– ––––––––––––––––––––––––––––
  x    8.039 KiB 1024-element Vector{Float64}