In these examples, manually splitting and reducing the sum provides a good scaling, while using the `Atomic`

operations is prohibitive:

```
using Base.Threads, BenchmarkTools, Test
function f_manual()
total_Sum = zeros(nthreads())
@threads for i in 1:10000000
total_Sum[threadid()] += sin(i)
end
sum(total_Sum)
end
function f_atomic()
total_Sum = Atomic{Float64}(0.)
@threads for i in 1:10000000
atomic_add!(total_Sum,sin(i))
end
total_Sum.value
end
@test f_manual() â f_atomic()
print("manual: "); @btime f_manual();
print("atomic: "); @btime f_atomic();
```

Result:

```
leandro@pitico:~/Drive/Work/JuliaPlay% julia -t8 ompt.jl
manual: 99.064 ms (44 allocations: 3.84 KiB)
atomic: 993.152 ms (44 allocations: 3.72 KiB)
```

It is me that donât know how to use atomic operations, there is an explanation, or what?

for comparison, this is the result with a single thread:

```
leandro@pitico:~/Drive/Work/JuliaPlay% julia ompt.jl
manual: 295.531 ms (7 allocations: 640 bytes)
atomic: 415.734 ms (7 allocations: 560 bytes)
```