Allocations when moving loop into a function

Why does the first variant (function foo) allocate a large amount of memory while the second variant (function bar) does not? The difference is whether the loop is inside or outside the function. Is there a way to have the loop outside without so many allocations?

using BenchmarkTools

function foo!(F, A)
  F[1] = A[2]
  F[2] = - A[1]
  return nothing
end

function bar!(F, A)
  for i = 1:size(A,1)
    F[i,1] = A[i,2]
    F[i,2] = -A[i,1]
  end
  return nothing
end

n = 15_000
A = rand(n,2)
F = similar(A)

@btime begin
  for i = 1:n
    @views foo!((F[i, :]), (A[i, :]))
  end
end

@btime begin
  bar!(F, A)
end

Reults:
6.409 ms (161935 allocations: 3.62 MiB)
4.062 μs (0 allocations: 0 bytes)

Hi, and welcome to the Julia community!

The problem is that in

@btime begin
  for i = 1:n
    @views foo!((F[i, :]), (A[i, :]))
  end
end

n is a (non-const, non-typed) global variable, meaning the compiler doesn’t know its type, nor that of i. So in every iteration we need to check the types involved. The same is true for A and F. If you declare them all const (note that this still allows in-place mutation of the Arrays), I get

julia> @btime begin
  for i = 1:n
    @views foo!((F[i, :]), (A[i, :]))
  end
end
  13.000 μs (0 allocations: 0 bytes)

Alternatively, BenchmarkTools.jl also allows for interpolating global variables using $:

julia> ... # non-const n, A, F

julia> @btime begin
           for i = 1:$n
               @views foo!(($F[i, :]), ($A[i, :]))
             end
         end
  11.000 μs (0 allocations: 0 bytes)

I’m not sure why we don’t need such interpolation for bar!, though. But even if we did, we only need to determine the types of F and A once, instead of once in every iteration, so the timing and allocation difference between the interpolated and non-interplated version would be much less pronounced.

Thanks a lot!

Keep in mind you probably want everything where performance matters to be in a function, because it’s not compiled otherwise. See the Performance tips for more such suggestions:

Actually, maybe even take a look at the in-development version of the Performance tips, as they have a better structure: