@turbo macro giving slightly different results

julia> using LoopVectorization

julia> global ans1 = 0.0e0
0.0

julia> @turbo for i = 1:1000000
           ans1=ans1+sqrt(i)
       end

julia> println(ans1)
6.666671664588217e8

julia> global ans2=0.0e0
0.0

julia> for i = 1:1000000
           global ans2
           ans2=ans2+sqrt(i)
       end

julia> ans3 = sum(sqrt∘big, 1:1000000) |> Float64;

julia> println(ans1-ans3)
-4.76837158203125e-7

julia> println(ans2-ans3)
1.9669532775878906e-5

julia> (ans2-ans3) / (ans1-ans3)
-41.25

For this particular example on my particular hardware (tiger lake CPU), the in-order sum has 40 times the error as the @turbo version.

3 Likes