Performance of custom `Vec` type versus `SVector{3, Float64}`

Odd. FWIW, this fixes it for me:

julia> @btime foo($ps, $y);
  8.176 μs (2 allocations: 78.17 KiB)

julia> LinearAlgebra.dot(x::Vec,y::Vec) = muladd(x.z,y.z,muladd(x.y,y.y,x.x*y.x))

julia> @btime foo($ps, $y);
  3.958 μs (2 allocations: 78.17 KiB)

julia> @btime foo($vs, $x);
  3.932 μs (2 allocations: 78.17 KiB)

Julia+LLVM really do not like vectorizing outer loops, whenever an inner loop is present.
Thus manually unrolling the inner loop can make a big difference.
Thus, foo, which loops over vs, is SIMD if we manually unroll, but not otherwise. =/

2 Likes