IMO this is the wrong end to approach multithreading. For several reasons, I’d rather recommend multithreading on as coarse of a level as possible, not in your inner loops.

Nonetheless, let’s look into the thread safety of your code. This operation:

```
S += sum(se .* se)
```

is not thread safe, since the way this operation works internally is by doing:

```
1. read current value of S
2. add to this value
3. write new value back to S
```

Now, if you have two threads doing this in parallel, these operations might interleave to produce the wrong result:

```
1. thread 1 reads current value of S
2. thread 2 reads current value of S
3. thread 1 adds to its version of S
4. thread 2 adds to its version of S
5. thread 1 writes new value to S
6. thread 2 writes new value to S, overwriting what thread 1 just wrote
```

This is commonly referred to as a *race condition*. You can see it in action in the example that @evanfields posted above:

```
function f_thread(n)
s = 0
Threads.@threads for i in 1:n
s += sum(j*j for j in 1:i)
end
return s
end
julia> f_thread(1000)
43483971876
julia> f_thread(1000)
80136425181
julia> f_thread(1000)
57182835925
```

One solution to this problem is to use *atomic* operations:

```
function f_thread_atomic(n)
s = Threads.Atomic{Int}(0)
Threads.@threads for i in 1:n
Threads.atomic_add!(s, sum(j*j for j in 1:i))
end
return s[]
end
julia> f_thread_atomic(1000)
83667083500
julia> f_thread_atomic(1000)
83667083500
```

Comparing to the analytical solution we can see that this yields the correct result:

```
f_analytical(n) = (n+1)^2*(n^2+2n)÷12
julia> f_analytical(1000)
83667083500
```