Understanding the allocation behavior on arrays vs scalars

Use StaticArrays for small fixed-size arrays and it will be efficiently unrolled by the compiler as if you had used individual variables.

When you use a general Array type to hold a small vector like x1, the size of the array is a runtime value — the compile has to allocate the array on the heap, and every vector operation like x1 .+= 1 gets turned into a loop like for j = 1:length(x1); x1[j] += 1; end. For such a small operation (adding two numbers), the overhead of checking/updating the loop counter, and dereferencing the array x1[j] to access a memory location in the heap, is significant.

Also, δ[i, :] allocates a copy (a 2-element array on the heap). @views δ[i, :] gets rid of this copy, but currently still allocates a view object (a SubArray object) on the heap, so you still have the overhead of an allocation. Compared to the cost of adding two numbers, allocating any object on the heap, no matter how small, is expensive.

In contrast, when x1 and x2 are scalars, the compiler probably stores them directly in registers. There is no for j = 1:2 loop counter that has to be incremented and checked. There is no memory load/store to read/write x1[j]. You get the raw cost of two additions plus the cost of reading δ[i,1] and `δ[i,2] from memory.

A StaticArray type is a type of array where the length of the array is part of the type, so that it is known to the compiler — the compiler can completely unroll any loops over a StaticArray (no loop counters), and can store the array members on the stack or in registers (no heap access), just as if you had declared the elements of the array as scalar variables.

PS. Using @btime find_split_1($x, $δ) with the @btime macro from the BenchmarkTools package will generally give more accurate timing results.

6 Likes