In addition to @Jakub_Wronowski’s very good answer, you may want to know that `Threads.@threads`

can directly be used to parallelize the summation loop (which spares you the pain of computing the correct partition of the indices range).

Something like this yields approximately the same performances as your threaded version, but is perhaps more readable:

```
function fnSum(x)
# Array holding one partial sum per thread
partialSum = zeros(eltype(x), Threads.nthreads())
# Threads.@threads directly parallelizes the loop over elements in x
Threads.@threads for e in x
@inbounds partialSum[Threads.threadid()] += e
end
# Final reduction
sum(partialSum)
end
```

Some benchmarking (on my machine with `JULIA_NUM_THREADS=4`

, matching the number of cores, and with a large enough vector):

```
julia> using BenchmarkTools
julia> let
x = rand(1_000_000);
r1 = @btime fnTotalSum($x,Threads.nthreads()) # threaded
r2 = @btime fnPartialSum($x,1,$(length(x))) # not threaded
r3 = @btime fnSum($x) # threaded
println("$r1\n$r2\n$r3")
end
# Benchmark
405.389 μs (31 allocations: 3.16 KiB) # fnTotalSum (threaded)
1.444 ms (0 allocations: 0 bytes) # fnPartialSum (not threaded)
409.454 μs (31 allocations: 3.13 KiB) # fnSum (threaded)
# Results
499823.13580023544 # fnTotalSum (threaded)
499823.13580021414 # fnPartialSum (not threaded)
499823.13580023544 # fnSum (threaded)
```

PS: I sprinkled a few `@inbounds`

here and there to try and benchmark comparable implementations. Here is the complete script I used:

##
Script

```
function fnPartialSum(x,t1,t2)
y = zero(eltype(x))
@inbounds for i = t1:t2
y = y + x[i]
end
return y
end
function fnTotalSum(x,nChunks)
T = length(x)
m = cld(T,nChunks) #no. elements in each chunk
xSum = zeros(nChunks) #pre-allocate space for partial sums
Threads.@threads for i = 1:nChunks #do nChunks partial sums
@inbounds xSum[i] = fnPartialSum(x,1+(i-1)*m,min(i*m,T))
end
Sum = sum(xSum)
return Sum
end
function fnSum(x)
partialSum = zeros(eltype(x), Threads.nthreads())
Threads.@threads for e in x
@inbounds partialSum[Threads.threadid()] += e
end
sum(partialSum)
end
using BenchmarkTools
let
x = rand(1_000_000);
r1 = @btime fnTotalSum($x,Threads.nthreads()) #use 3 chunks (threaded)
r2 = @btime fnPartialSum($x,1,$(length(x))) #compare with no-thread
r3 = @btime fnSum($x)
println("$r1\n$r2\n$r3")
end
```