Have you considerd using StaticArrays.jl, perhaps in conjunction with the custom BLAS methods mentioned here?