Yes.
@turbo makes optimistic aliasing assumptions.
If an index depends on a loop, it assumes the address is different for every iteration.
If an index does not depend on a loop, it assumes the address is the same for every iteration.
In other words, because you have s[a] and a depends on loop i, it assumes every iteration of i will result in a different address of s, and thus these can be performed in parallel.
In the scalar case, it knows you’re accumulating to the same number across iterations, and thus does that correctly.
I’m prioritizing correctness in the rewrite, so it should handle it correctly.
But making that case fast is difficult. I’d have to think about it more.
With the rewrite, it’d need you to add @simd ivdep to get the current behavior.