I am writing a performance sensitive application that cannot have any allocations which also runs inference using a multi layered perceptron. The dimensions of each layers output in Neural Network are fairly small and benchmarking individual layers forward pass stage revealed that using StaticArrays was faster.
My question is how do I implement the inference step in an elegant way using StaticArrays. My current implementation uses Matrix and Vectors for weights and bias of each layer and A_mul_B! to avoid any allocations.