julia> function mydot(x,y)
eachindex(x) == eachindex(y) || throw(DimensionMismatch("x and y must be same size!"))
s = zero(Base.promote_eltype(x,y))
@fastmath for i = eachindex(x)
s += x[i]*y[i]
end
s
end
mydot (generic function with 1 method)
julia> x = rand(128); y = rand(128);
julia> @btime mydot($x,$y)
9.702 ns (0 allocations: 0 bytes)
30.668576075140155
Assembly and LLVM IR show the expected SIMD, e.g.
%33 = fmul fast <8 x double> %wide.load59, %wide.load
%34 = fmul fast <8 x double> %wide.load60, %wide.load56
%35 = fmul fast <8 x double> %wide.load61, %wide.load57
%36 = fmul fast <8 x double> %wide.load62, %wide.load58
%37 = fadd fast <8 x double> %33, %vec.phi
%38 = fadd fast <8 x double> %34, %vec.phi53
%39 = fadd fast <8 x double> %35, %vec.phi54
%40 = fadd fast <8 x double> %36, %vec.phi55
But I would suggest double checking that it is working correctly, at least for any code you expect to be hot enough to matter.
EDIT:
In this case, LLVM actually also managed to SIMD it even without the check. It created a fast path. So, it can do quite well in simple cases.