High performance vector/matrix/tensor linear algebra operations

Just back to update on the solution

Using LoopVectorization.jl

Example:
https://docs.juliahub.com/LoopVectorization/4TogI/0.12.147/examples/matrix_multiplication/