# Undesired memory allocation during interpolation

Dear all,

I am writing a script that will require many interpolations in a row (10000s).
I am trying to make it such that no allocation is made during the successive calls to `Itp1D ()`. Therefore I’ve allocated all required arrays for interpolation initially. However, it stills allocates significant amounts of memory. Would anyone have hints on how to reduce the amount of allocations?

``````using Plots#, MAT

@views function Itp1D( Xc, Pc, iW, wW, Xdata, Pdata, dP, Pmin )
# Interpolate in 1D:              X_W -----------o---x_E
iW .= Int64.(round.(trunc.( (Pc .-  Pmin)./dP ) .+ 1))   # index of the west node is database
wW .= 1.0 .- (Pc - Pdata[iW])./dP
Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW.+1]
end

@views function main()
# 1. Generate data in 1D
Pf_data  = LinRange(-0.5, 0.5, 1000)    # data axis
X_data   = exp.(-Pf_data.^2. / 0.05.^2) # data
dP       = Pf_data - Pf_data      # data spacing info
Pf_min   = Pf_data                   # data info
# 2. Generate a 2D space
ncx, ncy = 1000, 1000
xc       = LinRange(-1, 1, ncx)
yc       = LinRange(-1, 1, ncy)
xc2d     = repeat(xc, 1, length(yc))
yc2d     = repeat(yc, 1, length(xc))'
# 3. Some pre-allocated arrays
wW       = zeros(ncx,ncy)
Xc       = zeros(ncx,ncy)
iW       = zeros(Int64,ncx,ncy) # Achtung: has to be Int since it's an index
Pfc      = yc2d./4.0
# 4. Interpolate many times
@time for i=1:40
@views Itp1D( Xc, Pfc, iW, wW, X_data, Pf_data, dP, Pf_min )
end
# 5. Viz
# p = plot( Pf_data, X_data )
# p = plot!(Pfc[:], Xc[:], seriestype = :scatter)
# display(p)
end

main()
main()
``````

Are you aware of the `@.` macro? It might simplify your code. Also have a look at `trunc(Int, x) ` to get rid of round and the call to the extra int conversion.

If these are arrays, you have missed a dot here.

1 Like

Oh thanks, initially
` 1.006806 seconds (160 allocations: 610.358 MiB, 6.79% gc time)`
with the dot:
` 0.867322 seconds (80 allocations: 305.179 MiB, 3.14% gc time)`
half is now gone The remainder allocation is caused by the line:

``````Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW.+1]
``````

where the underlying broadcasting in `Xdata[iW]` seems to require additional allocations.
Replacing this line by a loop version:

``````for i=1:length(Xc)
Xc[i] = wW[i] * Xdata[iW[i]] +  (1.0 - wW[i]) * Xdata[iW[i] + 1]
end
``````

removes all additional memory allocations and yields:
` 0.542172 seconds`

Yeah nice spot, this expression creates a temporary array.

If the performance of this function is super critical, you may try your luck with GitHub - JuliaSIMD/LoopVectorization.jl: Macro(s) for vectorizing loops.
it can often speed things up a bit

Thanks a lot for the hints. Yes, I’ll very like use LoopVectorization. First, I wanted to nail down the issues with memory allocation. It already makes a big difference for he main code since this function is called ~10000 times in a row Why not

``````Xc .= @views wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW.+1]
``````

?

On the other hand, the `@views` macro does nothing here

Since there are no indexing expressions on that line.

1 Like

To get rid of the loop, I would have loved to: `Xc .= @views wW .* Xdata[iW] .+ (1.0 .- wW) .* Xdata[iW.+1]` but this still allocates memory somehow.
And yes, I have to remove `@views` from this place. At some point, I’ve just introduced `@views` everywhere since I could not figure out what vectorised expression was allocating memory… Are you sure that’s not a benchmarking artefact? Are you using BenchmarkTools.jl?

It does not matter if you create a view here, the array `iW.+1` is still being allocated.

Ah, yes, missed that one. Broadcasting `getindex`, then?

Yeah I guess that would actually work You can benchmark `Itp1D` like this

``````# change in main
@btime Itp1D(\$Xc, \$Pfc, \$iW, \$wW, \$X_data, \$Pf_data, \$dP, \$Pf_min)

# julia> main()
#  9.839 ms (2 allocations: 7.63 MiB)
``````

Those `7.63MiB` (array of size `(1000, 1000)` with Float64 has exactly that size) are the only allocation which come from

``````Xdata[iW .+ 1]
``````

Instead of adding `.+ 1` to `iW` which creates an additional array, we can shift the underlying array allocation free with `ShiftedArrays.circshift`.

Version with ShiftedArrays.jl (should be correct?)
``````using BenchmarkTools
using ShiftedArrays

@views function Itp1D( Xc, Pc, iW, wW, Xdata, Pdata, dP, Pmin )
# Interpolate in 1D:              X_W -----------o---x_E
iW .= Int64.(round.(trunc.( (Pc .-  Pmin)./dP ) .+ 1))   # index of the west node is database
wW .= 1.0 .- (Pc .- Pdata[iW])./dP
# Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* Xdata[iW .+ 1]
Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* ShiftedArrays.circshift(Xdata, -1)[iW]
end

@views function main()
# 1. Generate data in 1D
Pf_data  = LinRange(-0.5, 0.5, 1000)    # data axis
X_data   = exp.(-Pf_data.^2. / 0.05.^2) # data
dP       = Pf_data - Pf_data      # data spacing info
Pf_min   = Pf_data                   # data info
# 2. Generate a 2D space
ncx, ncy = 1000, 1000
xc       = LinRange(-1, 1, ncx)
yc       = LinRange(-1, 1, ncy)
xc2d     = repeat(xc, 1, length(yc))
yc2d     = repeat(yc, 1, length(xc))'
# 3. Some pre-allocated arrays
wW       = zeros(ncx,ncy)
Xc       = zeros(ncx,ncy)
iW       = zeros(Int64,ncx,ncy) # Achtung: has to be Int since it's an index
Pfc      = yc2d./4.0
# 4. Interpolate many times

@btime Itp1D(\$Xc, \$Pfc, \$iW, \$wW, \$X_data, \$Pf_data, \$dP, \$Pf_min)
# @time for i=1:40
#     @views
# end
# 5. Viz
# p = plot( Pf_data, X_data )
# p = plot!(Pfc[:], Xc[:], seriestype = :scatter)
# display(p)
# return nothing
end
``````

Original:

``````julia> main()
9.835 ms (2 allocations: 7.63 MiB)
1000×1000 Matrix{Float64}:
1.39375e-11  1.54067e-11  1.70194e-11  1.88059e-11  2.07661e-11  2.29367e-11  …  2.07661e-11  1.88059e-11  1.70194e-11  1.54067e-11  1.39375e-11
``````

With `ShiftedArrays.circshift`:

``````julia> main()
8.972 ms (0 allocations: 0 bytes)
1000×1000 Matrix{Float64}:
1.39375e-11  1.54067e-11  1.70194e-11  1.88059e-11  2.07661e-11  2.29367e-11  …  2.07661e-11  1.88059e-11  1.70194e-11  1.54067e-11  1.39375e-11
``````
2 Likes

Oh, I missed the suggestion with `getindex`.

``````Xc .= wW .* Xdata[iW] .+  (1.0 .- wW) .* getindex.(Ref(Xdata), iW .+ 1)
``````

results in:

``````julia> main()
8.431 ms (1 allocation: 16 bytes)
1000×1000 Matrix{Float64}:
1.39375e-11  1.54067e-11  1.70194e-11  1.88059e-11  2.07661e-11  2.29367e-11  …  2.07661e-11  1.88059e-11  1.70194e-11  1.54067e-11  1.39375e-11
``````

I wonder if it would be useful to have a macro like `@.`, but which also changes `X[v]` into `getindex.(Ref(X), v)`. Would that work? This seems like a common pattern to run into.

1 Like

If you broadcast the first indexing expression too, the last allocation should go away.

Not quite sure? We are talking about 16 Byte?

In fact, when I apply your suggestion

``````Xc .= wW .* getindex.(Ref(Xdata), iW) .+  (1.0 .- wW) .* getindex.(Ref(Xdata), iW .+ 1)
``````

It gets even worse:

``````julia> main() # two times Ref
8.802 ms (2 allocations: 32 bytes)

julia> main() # only Ref inserted where needed
8.316 ms (1 allocation: 16 bytes)
``````

I guess the reason is, that `Ref` causes an allocation? That’s why we see a 16 Byte allocation in my version with `getindex.` where we use 1 `Ref` statement.

``````julia> x = [1,2];

julia> f(x) = Ref(x)
f (generic function with 2 methods)

julia> @btime f(\$x)
7.632 ns (1 allocation: 16 bytes)
Base.RefValue{Vector{Int64}}([1, 2])

julia> f(x) = x
f (generic function with 2 methods)

julia> @btime f(\$x)
1.305 ns (0 allocations: 0 bytes)
2-element Vector{Int64}:
1
2
``````

That doesn’t add up. If `Ref` allocates (which is surprising), then there should be two allocations in both cases, since `Xdata[iW]` definitely allocates.

No `Xdata[iW]` does not allocate (it would be definitely bigger than 16 Byte!) because of the `@views` macro in front of the function.

`Ref` allocates because it essentially creates a struct which is very similar to array but without a few extra information

From `julia-1.7.0/share/julia/base/refvalue.jl`

``````mutable struct RefValue{T} <: Ref{T}
x::T
RefValue{T}() where {T} = new()
RefValue{T}(x) where {T} = new(x)
end
RefValue(x::T) where {T} = RefValue{T}(x)
``````

``````julia> y = rand(1:10, 1024,);

julia> x = randn((1024, )); z = copy(x);

julia> @views function f(x,y,z)
z .= x[y]
end
f (generic function with 3 methods)

julia> @btime f(\$x, \$y, \$z);
949.625 ns (0 allocations: 0 bytes)

# whereas
julia> function g(x,y,z)
z .= x[y]
end
g (generic function with 2 methods)

julia> @btime g(\$x, \$y, \$z);
1.156 μs (1 allocation: 8.12 KiB)

julia> varinfo(r"x")
name      size summary
–––– ––––––––– ––––––––––––––––––––––––––––
x    8.039 KiB 1024-element Vector{Float64}

``````