Unexplained Allocation

Here is a short code-snippet that confused me. I’d appreciate any explanation of this phenomenon?

using BenchmarkTools
f(A, B) = A, B
g(A, B) = A
A = rand(5)
B = rand(5)
@btime f($A, $B);    #  10.919 ns (1 allocation: 32 bytes)
@btime f(1, 2);      #  0.041 ns (0 allocations: 0 bytes)
@btime g($A, $B);    #  1.967 ns (0 allocations: 0 bytes)

It seems creating a tuple of two vectors requires an allocation? Where does this allocation come from? How can I avoid it?

julia> versioninfo()
Julia Version 1.1.0
Commit 80516ca202* (2019-01-21 21:24 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.2.0)
  CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 1
  JULIA_EDITOR = atom

Objects wrapping other heap allocated objects (like a tuple wrapping arrays) need to be heap allocated itself unless the compiler can figure out a way to elide it (which it often can).

It would be good to see code that is more closely related to the application to give further comments how to avoid it etc.

1 Like

Thank you for the explanation. My application is that I’m computing a Jacobi Polynomial basis. Low-is order, say up to 20 or 30 at most. But millions of times. The workaround it easy - I just have to return nothing instead of P, dP and not rely on the being returned. It is just a little inconvenient and a little ugly.

struct Jacobi{T}
   α::T
   β::T
   A::Vector{T}
   B::Vector{T}
   C::Vector{T}
end

function Jacobi(α, β, N, T=Float64)
   # precompute the recursion coefficients
   A = zeros(T, N)
   B = zeros(T, N)
   C = zeros(T, N)
   for n = 2:N
      c1 = big(2*n*(n+α+β)*(2*n+α+β-2))
      c2 = big(2*n+α+β-1)
      A[n] = T( big(2*n+α+β)*big(2*n+α+β-2)*c2 / c1 )
      B[n] = T( big(α^2 - β^2) * c2 / c1 )
      C[n] = T( big(-2*(n+α-1)*(n+β-1)*(2n+α+β)) / c1 )
   end
   return Jacobi(α, β, A, B, C)
end

function eval_grad!(P::AbstractVector, dP::AbstractVector,
                    J::Jacobi, x::Number, N::Integer = length(P)-1)
   α, β = J.α, J.β
   P[1] = 1
   dP[1] = 0
   if N >= 1
      P[2] = (α+1) + 0.5 * (α+β+2) * (x-1)
      dP[2] = 0.5 * (α+β+2)
   end
   for n = 2:N
      c1 = J.A[n] * x + J.B[n]
      c2 = J.C[n]
      P[n+1] = c1 * P[n] + c2 * P[n-1]
      dP[n+1] = J.A[n] * P[n] + c1 * dP[n] + J.C[n] * dP[n-1]
   end
   return P, dP
end

Does it affect the performance of the code?

The 1 allocation? I haven’t actually written the main part of the code yet so it’s too early to tell. I just test and profile the individual components before moving on. I was just struck by this and wanted to understand what is going on.

I guess I know what you meant now, if I call the offending function inside another function multiple times, like so - not using the output, but only the input

function eval_many!(P, dP, J, N)
   for n = 1:1000 
      eval_grad!(P, dP, J, rand(), N)
      # do something with P, dP
   end
end

then the compiler can avoid that one allocation.

1 Like

It also sounds like you would benefit from using MVectors from StaticArrays.jl in place of regular Vectors, since you have a lot of small vectors with known size.

1 Like

I heavily use MVectors and SVectors wherever possible.

What you are seeing here is just a prototype.