julia> using LoopVectorization
julia> global ans1 = 0.0e0
0.0
julia> @turbo for i = 1:1000000
ans1=ans1+sqrt(i)
end
julia> println(ans1)
6.666671664588217e8
julia> global ans2=0.0e0
0.0
julia> for i = 1:1000000
global ans2
ans2=ans2+sqrt(i)
end
julia> ans3 = sum(sqrt∘big, 1:1000000) |> Float64;
julia> println(ans1-ans3)
-4.76837158203125e-7
julia> println(ans2-ans3)
1.9669532775878906e-5
julia> (ans2-ans3) / (ans1-ans3)
-41.25
For this particular example on my particular hardware (tiger lake CPU), the in-order sum has 40 times the error as the @turbo
version.