Writing code that works allocation-free on both vectors and SVectors

I’d like to write generic code that operates on abstract vectors, and is non-allocating. A specific example is a function that subtracts the mean and normalizes a vector. This can be done in-place. However, the problem with writing code that specifically does operations in-place is that it will not work on immutable types, such as SVectors (static vectors from the StaticArrays package).

So I’m having difficulty understanding how I can write such methods that is both efficient and generic.

I have tried the following, hoping the compiler would be smart enough to in-place the operations:

using StaticArrays

mean0(vec) = vec .- (sum(vec) / length(vec))
norm1(vec) = vec .* (1 / sqrt(vec' * vec))
mean0norm1(vec) = norm1(mean0(vec))

function test()
    A = SVector{16, Float64}(randn(16))
    println(@allocated A = mean0norm1(A))

    B = MVector{16, Float64}(randn(16))
    println(@allocated B = mean0norm1(B))

    C = randn(16)
    println(@allocated C = mean0norm1(C))
end

test()

the output of which is:

0
288
384

This shows the compiler won’t get rid of the allocations.

Is there any way around this? It feels like I have to either give up generality, with two code paths, one for mutable and one for immutable objects, or give up on avoiding allocations and having performant code. Is that really the case?

1 Like

I have used

before. It handles some cases, but unfortunately not all. Sometimes I insert a simple branch in the code to handle the remaining cases.

See particularly Home · BangBang.jl

1 Like

Thanks! Good to know I’m not being stupid, and that this is an actual problem!

I just tried BangBang, as follows, but it doesn’t completely remove the allocations:

using StaticArrays, BangBang

mean0(vec) = add!!(vec, -(sum(vec) / length(vec)))
norm1(vec) = lmul!!(vec, 1 / sqrt(vec' * vec))
mean0norm1(vec) = norm1(mean0(vec))

function test()
    A = SVector{16, Float64}(randn(16))
    println(@allocated A = mean0norm1(A))

    B = MVector{16, Float64}(randn(16))
    println(@allocated B = mean0norm1(B))

    C = randn(16)
    println(@allocated C = mean0norm1(C))
end

test()

giving:

0
288
192

Using the ismutabletype function, I can do a relatively simple dual code path solution in my particular case:

using StaticArrays

mean0!(vec) = vec .-= sum(vec) / length(vec)
mean0(vec) = vec .- sum(vec) / length(vec)
norm1!(vec) = vec .*= 1 / sqrt(vec' * vec)
norm1(vec) = vec .* 1 / sqrt(vec' * vec)
mean0norm1!!(vec::T) where T = ismutabletype(T) ? norm1!(mean0!(vec)) : norm1(mean0(vec))

function test()
    A = SVector{16, Float64}(randn(16))
    println(@allocated A = mean0norm1!!(A))

    B = MVector{16, Float64}(randn(16))
    println(@allocated B = mean0norm1!!(B))

    C = randn(16)
    println(@allocated C = mean0norm1!!(C))
end

test()

which does achieve 0 allocations in all 3 cases. Still, it’s complexity I’d hope to avoid.

It turns out that I linked the wrong package, sorry for that. I didn’t use BangBang.jl, I used the macro @bangbang from MaybeInplace.jl

Here’s working code that is allocation free:

using StaticArrays, MaybeInplace

mean0(vec) = @bangbang vec .-= (sum(vec) / length(vec))
norm1(vec) = @bangbang vec .*= (1 / sqrt(vec' * vec))
mean0norm1(vec) = norm1(mean0(vec))

function test()
    A = SVector{16, Float64}(randn(16))
    println(@allocated A = mean0norm1(A))

    B = MVector{16, Float64}(randn(16))
    println(@allocated B = mean0norm1(B))

    C = randn(16)
    println(@allocated C = mean0norm1(C))
end

test()
1 Like

SciML must be using this functionality or another one, no?

Yes
image

1 Like