The accumulator is shared between threads, causing slow speed and wrong answers.
Try @tturbo
. That should do the correct thing.
julia> @btime f(100000)
25.320 μs (0 allocations: 0 bytes)
3.141602726360119
julia> @btime fturbo(100000)
7.160 μs (0 allocations: 0 bytes)
3.1416026524898792
This is with 4 threads.
Accuracy is similar,:
julia> π - fturbo(100000)
-9.998900086127804e-6
julia> π - f(100000)
-1.007277032583076e-5