# Avoiding memory allocation when repeatedly using small arrays

I often find myself needing fairly simple, small autodiff vector operations applied repeatedly. I don’t know how to do this without allocation each time. Is it possible?

Here is an example where I’m integrating an array `p` over a surface implicitly defined by `f(x)=0`.

``````using BenchmarkTools
using ForwardDiff
using StaticArrays

n = 2^7
p = rand(n,n)

f(x) where N = √sum(abs2,x .- 2^6) - 2^5
function n_direct(x)
v = x .- 2^6
return v/√sum(abs2,v)
end

function oint(f,n,p::Array{T,N}) where {T,N}
s = zeros(T,N)
for I ∈ CartesianIndices(p)
x = SA_F64[I.I...]
d = f(x)::Float64
abs(d) ≤ 1 && (s .+= n(x).*p[I]*(1+cos(π*d))/2)
end
return s
end
``````

I typically want to use `ForwardDiff.gradient` to define the normal `n`, so the user doesn’t need to, but this causes an allocation.

``````julia> @benchmark oint(f,n_direct,p)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max):  127.100 μs … 499.900 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     127.400 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   130.284 μs ±  13.055 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

█▂ ▃                    ▂                                     ▁
█████▇▇▄▄▄▃▇▇▇▆▅▄▃▃▃▄▂▅▄█▆▄▄▄▄▄▅▄▅▃▂▄▂▃▂▃▂▄▄▂▂▄▂▅▃▃▂▃▄▄▅▃▂▃▄▃ █
127 μs        Histogram: log(frequency) by time        200 μs <

Memory estimate: 96 bytes, allocs estimate: 1.

julia> @benchmark oint(f,n_auto,p)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max):  142.400 μs …  4.375 ms  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     143.000 μs              ┊ GC (median):    0.00%
Time  (mean ± σ):   150.388 μs ± 96.927 μs  ┊ GC (mean ± σ):  1.60% ± 2.69%

█▂ ▁  ▂▂▂ ▁▁        ▁                                        ▁
████▆████████▇▆▆▄▅▃▅██▆▄▅▃▃▄▅▅▄▄▄▃▃▃▄▃▄▄▄▄▄▄▃▄▃▅▄▄▅▃▄▃▃▄▄▄▅▄ █
142 μs        Histogram: log(frequency) by time       228 μs <

Memory estimate: 38.72 KiB, allocs estimate: 825.
``````

I really enjoyed the Workshop yesterday on tools to improve performance, but I’m not sure what to do to fix this…

``````julia> using JETTest
julia> @report_dispatch oint(f,n_auto,p)
═════ 2 possible errors found ═════
│││││ runtime dispatch detected: f::typeof(f)(%8::SVector{2, ForwardDiff.Dual{ForwardDiff.Tag{typeof(f), Float64}, Float64, 2}})
││││└───────────────────────────────────────────────────────────────────────
││ runtime dispatch detected: %1::typeof(f)(%9::SVector{2, ForwardDiff.Dual{ForwardDiff.Tag{typeof(f), Float64}, Float64, 2}})
│└───────────────────────────────────────────────────────────────────────
(Vector{Float64}, 2)
``````

`center` and `radius` are global variables.

3 Likes

You’re right bad example. I’ll correct it.

For some reason the `where N` in `f(x)` is causing allocations:

``````julia> f(x) where N = √sum(abs2,x .- 2^6) - 2^5
f (generic function with 1 method)

julia> @btime f(\$x)
22.725 ns (2 allocations: 48 bytes)
78.32565875726226

julia> f(x) = √sum(abs2,x .- 2^6) - 2^5
f (generic function with 1 method)

julia> @btime f(\$x)
5.862 ns (0 allocations: 0 bytes)
78.32565875726226

``````

I had to change that to get what you report:

``````julia> @benchmark oint(\$f,\$n_auto,\$p)
BechmarkTools.Trial: 10000 samples with 1 evaluations.
Range (min … max):  114.826 μs … 284.301 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     115.015 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   117.534 μs ±   9.510 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

█ ▁     ▁   ▁                                             ▁   ▁
█▆█▆▇▇▆▆█▃▄▃█▄▁▄▁▇▆▇▇▇▇▅▅▆▅▇▅▅▆▅▇▆▃▅▅▅▇▆▆▆▅▅▆▆▅▆▅▅▅█▄▅▅▅▅▇█▇▆ █
115 μs        Histogram: log(frequency) by time        160 μs <

Memory estimate: 96 bytes, allocs estimate: 1.

``````

Then, with these changes, you get 0 allocations:

``````julia> function oint(f,n,p::Array{T,N}) where {T,N}
s = zeros(SVector{N,T}) # static vector here
for I ∈ CartesianIndices(p)
x = SVector{N,T}(ntuple(i->I[i],N)) # initialize svector with size defined at compile time
d = f(x)::Float64
abs(d) ≤ 1 && (s += n(x).*p[I]*(1+cos(π*d))/2) # remove a dot in s .+=
end
return s
end
oint (generic function with 1 method)

julia> @benchmark oint(\$f,\$n_auto,\$p)
BechmarkTools.Trial: 10000 samples with 1 evaluations.
Range (min … max):  105.190 μs … 221.240 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     105.372 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   107.170 μs ±   7.259 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

█      ▃▂    ▁                                                ▁
█▆▅▆▇████▆▆▄▅█▄▁▄▆▆▄▅▄▅▇▅▅▅▅▇▁▅▃▃▄▇▄▃▄▅▄▇▅▅▄▅▃▇▄▃▅▃▄▄▇▄▅▃▃▄▅█ █
105 μs        Histogram: log(frequency) by time        145 μs <

Memory estimate: 0 bytes, allocs estimate: 0.

``````
3 Likes

Thanks. That was a typo left in from previous attempts!

Yeah. I recomend always using `Test.detect_unbound_args(YourPackage)` or Aqua to check for this.
Of course, these only work if your code is organized into repositories.

The one annoying thing is that `f(x::NTuple{N,T}) where {N,T}` will result in `T` being an unbound arg, as it will be unbound if you call it with `f(())` (i.e., with an empty `Tuple{}`).

5 Likes