Feedforward NN using StaticArrays with no allocation

memory-allocation

#1

I am writing a performance sensitive application that cannot have any allocations which also runs inference using a multi layered perceptron. The dimensions of each layers output in Neural Network are fairly small and benchmarking individual layers forward pass stage revealed that using StaticArrays was faster.

My question is how do I implement the inference step in an elegant way using StaticArrays. My current implementation uses Matrix and Vectors for weights and bias of each layer and A_mul_B! to avoid any allocations.


#2

Just use out of place. StaticArrays are stack-allocated structures so they won’t allocate when you create them. They’re more like a high dimensional number, like a complex number or a Float64. They don’t heap-allocate memory. So the algorithm is just tanh(W2*sigma(W1*x)) etc.


#3

Thanks for your response. I am still getting allocations when I run this simple foobar example in Julia 0.6. I think I might be missing something.

using BenchmarkTools
using StaticArrays

@inline relu{T <: AbstractFloat}(x::T) = max(zero(T), x)

function profile()
    input = @SVector rand(5);
    W1 = @SMatrix rand(10,5); b1 = @SVector rand(10);
    W2 = @SMatrix rand(1,10); b2 = @SVector rand(1);
    
    @btime W2*relu.(W1*input+b1)+b2
end

profile()
# prints the following
# 274.584 ns (5 allocations: 320 bytes)
# 1-element StaticArrays.SArray{Tuple{1},Float64,1,1}:
# 11.5083

#4

@ChrisRackauckas Actually, I take that back. If use @allocated instead of @btime I get zero allocations. Not sure what’s happening there. I am assuming that means there are no allocations in practice.


#5

Is this coded on Julia 0.6? 1.0 has a lot of performance fixes.

Also, interpolate the variables into the @btime macro, or make them const.


#6

Yes, it is coded in Julia 0.6 (edited the comment to state it explicitly). I guess the few allocations were because of the macro creating the closure. Following what you said caused it to have zero allocations as shown below:

using BenchmarkTools
using StaticArrays

@inline relu{T <: AbstractFloat}(x::T) = max(zero(T), x)

function profile()
    @btime W2*relu.(W1*input+b1)+b2 setup=(input = @SVector rand(5); W1 = @SMatrix rand(10,5); b1 = @SVector rand(10); W2 = @SMatrix rand(1,10); b2 = @SVector rand(1);)
end

profile()
#  22.266 ns (0 allocations: 0 bytes)
# 1-element StaticArrays.SArray{Tuple{1},Float64,1,1}:
# 10.2185