Assigning in place to avoid allocations

I’m trying to understand how I can assign in-place to avoid allocations. Here is a simple example:

nParticles = 10
L = 4
particles = zeros(nParticles,2)
norms = zeros(nParticles)#[0.0 for i=1:nParticles]
newParticle = rand(2) * L
norms .= [norm(newParticle - x) for x in eachrow(particles)]

The last line requires 69 k allocations. Is there a way to reduce that? I’m more interested in learning over just optimizing this particular code.

The right hand side is allocating an array for the comprehension [... for x in ...]. It also allocates an array for newParticle - x.

The @. here doesn’t do what you want because it adds . to everything: even calls like eachrow(particles) become eachrow.(particles).

You can just write a loop

for i = 1:nParticles
    norms[i] = norm(newParticle - @view(particles[i,:]))
end

but this still allocates a 2-element vector for each i for the result of the vector subtraction.

However, there is a much better way to do this. For small fixed-size vectors like your 2-component coordinate vectors here, use StaticArrays:

particles = zeros(SVector{2,Float64}, nParticles)  # an array of 2-component SVectors
newParticle = (@SVector rand(2)) * L
norms = zeros(nParticles)
for i = 1:nParticles
    norms[i] = norm(newParticle - particles[i])
end

The norm(newParticle - particles[i]) calculation will be completely unrolled and inlined by the compiler, and will allocate no arrays on the heap.

(As usual, to see the lack of allocations you should put the code in a function, with no global variables, for benchmarking.)

3 Likes

Interesting. I’ve used Static Arrays before but I guess I don’t understand them fully. So because newParticle and particles[i] are SVectors, newParticle - particles[I] does not require any allocations??

The key point is that, unlike a normal array, the compiler knows the length of an SVector (because the length is a part of the type). This allows the compiler to stick them on the stack or in registers, unroll loops, etcetera. It makes them very efficient for things like 2d and 3d spatial coordinate vectors, which never change size dynamically.

Very cool. take home message is if you know the shape of your array is not going to change, use Static Arrays for speed.

Only for small arrays (< 100 elements).

For very large arrays, StaticArrays don’t buy you much and can actually be slower — e.g. if you have 10000 elements, unrolling loops is a very bad idea because it will overflow the instruction cache.