Here is a short code-snippet that confused me. I’d appreciate any explanation of this phenomenon?

```
using BenchmarkTools
f(A, B) = A, B
g(A, B) = A
A = rand(5)
B = rand(5)
@btime f($A, $B); # 10.919 ns (1 allocation: 32 bytes)
@btime f(1, 2); # 0.041 ns (0 allocations: 0 bytes)
@btime g($A, $B); # 1.967 ns (0 allocations: 0 bytes)
```

It seems creating a tuple of two vectors requires an allocation? Where does this allocation come from? How can I avoid it?

```
julia> versioninfo()
Julia Version 1.1.0
Commit 80516ca202* (2019-01-21 21:24 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.2.0)
CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 1
JULIA_EDITOR = atom
```

Objects wrapping other heap allocated objects (like a tuple wrapping arrays) need to be heap allocated itself unless the compiler can figure out a way to elide it (which it often can).

It would be good to see code that is more closely related to the application to give further comments how to avoid it etc.

1 Like

Thank you for the explanation. My application is that I’m computing a Jacobi Polynomial basis. Low-is order, say up to 20 or 30 at most. But millions of times. The workaround it easy - I just have to return `nothing`

instead of `P, dP`

and not rely on the being returned. It is just a little inconvenient and a little ugly.

```
struct Jacobi{T}
α::T
β::T
A::Vector{T}
B::Vector{T}
C::Vector{T}
end
function Jacobi(α, β, N, T=Float64)
# precompute the recursion coefficients
A = zeros(T, N)
B = zeros(T, N)
C = zeros(T, N)
for n = 2:N
c1 = big(2*n*(n+α+β)*(2*n+α+β-2))
c2 = big(2*n+α+β-1)
A[n] = T( big(2*n+α+β)*big(2*n+α+β-2)*c2 / c1 )
B[n] = T( big(α^2 - β^2) * c2 / c1 )
C[n] = T( big(-2*(n+α-1)*(n+β-1)*(2n+α+β)) / c1 )
end
return Jacobi(α, β, A, B, C)
end
function eval_grad!(P::AbstractVector, dP::AbstractVector,
J::Jacobi, x::Number, N::Integer = length(P)-1)
α, β = J.α, J.β
P[1] = 1
dP[1] = 0
if N >= 1
P[2] = (α+1) + 0.5 * (α+β+2) * (x-1)
dP[2] = 0.5 * (α+β+2)
end
for n = 2:N
c1 = J.A[n] * x + J.B[n]
c2 = J.C[n]
P[n+1] = c1 * P[n] + c2 * P[n-1]
dP[n+1] = J.A[n] * P[n] + c1 * dP[n] + J.C[n] * dP[n-1]
end
return P, dP
end
```

Does it affect the performance of the code?

The 1 allocation? I haven’t actually written the main part of the code yet so it’s too early to tell. I just test and profile the individual components before moving on. I was just struck by this and wanted to understand what is going on.

I guess I know what you meant now, if I call the offending function inside another function multiple times, like so - not using the output, but only the input

```
function eval_many!(P, dP, J, N)
for n = 1:1000
eval_grad!(P, dP, J, rand(), N)
# do something with P, dP
end
end
```

then the compiler can avoid that one allocation.

1 Like

It also sounds like you would benefit from using `MVector`

s from `StaticArrays.jl`

in place of regular `Vector`

s, since you have a lot of small vectors with known size.

1 Like

I heavily use MVectors and SVectors wherever possible.

What you are seeing here is just a prototype.