As an example, showing the difference between the thread/non-threaded version:

```
tmin=1;
tmax=10;
D1 = [1.0 2 3 4 5 6 7 8 9 10];
D2 = [10.0 9 8 7 6 5 4 3 2 1];
X = zeros(10);
Y = zeros(10);
function calculate!(X, Y, D1, D2, tmin,tmax)
for n in tmin+1:tmax
X[n] = D1[n]*D2[n] + Y[n-1];
Y[n] = D1[n-1]*D2[n-1] + X[n-1];
end
nothing
end
function calculate_threaded!(X, Y, D1, D2, tmin, tmax)
for n in tmin+1:tmax
Xn = Threads.@spawn D1[n]*D2[n] + Y[n-1];
Yn = Threads.@spawn D1[n-1]*D2[n-1] + X[n-1];
X[n] = fetch(Xn)
Y[n] = fetch(Yn)
end
nothing
end
```

We benchmark to see:

```
using BenchmarkTools
@btime calculate!(X, Y, D1, D2, tmin, tmax)
# 31.539 ns (0 allocations: 0 bytes)
@btime calculate_threaded!(X, Y, D1, D2, tmin, tmax)
# 41.249 Ī¼s (125 allocations: 9.53 KiB)
```

Using locks will likely only make this worse. You will only see a performance benefit if the calculations in ā`Threads.@spawn`

ā is significant (like a big matrix calculation).