add that @inbounds and you get the same performance, because the dot operation has that guaranteed.
That said, you can write that loop more cleanly as:
julia> function mm!(S, S1, S2)
for i in eachindex(S, S1, S2)
@inbounds S[i] = S1[i]*S2[i]
end
end
mm! (generic function with 2 methods)
julia> @btime mm!($S3, $S1, $S2)
1.706 μs (0 allocations: 0 bytes)
that eachindex(S, S1, S2) also guarantees that you are inbounds (it will error if the lengths of the arrays differ) - and, I bet, in the near future you won’t need, in this case, the @inbounds flag there to recover the performance of the dot operation.
@inbounds disables bounds checking (which is disabled already on the dot operation, because the input guarantees that it is inbounds). By not having to check bounds, the compiler might be able (it is in this case) to use SIMD computations.
The eachindex does not improve performance here, but it will probably do in future Julia versions, by automatically guaranteeing the @inbounds inside the loop.
It is some broadcast operation, but that machinery is pretty complicated.
Also, it is not safe to use @inbounds with for i in 1:n, both because the arrays may not have 1-based indexing, and also because it is easy for a user-provided n to be wrong. It is not a good idea to pass the length of an array as a separate input argument, that can easily, quickly and safely be read directly from the array itself.
With eachindex you don’t need to worry about this.