Hello,

I am trying to benchmark function with thread parallelization and compare the result with non-parallelized version of function.

```
using Base.Threads
nx = 10000
a = zeros(nx)
b = rand(nx)
dindicies = 2:(nx-1)
function update!(A,B)
for i=dindicies
@inbounds A[i] = B[i] - B[i-1]
end
end
function update_thread!(A,B)
@threads for i=dindicies
@inbounds A[i] = B[i] - B[i-1]
end
end
```

I get the result with @btime

```
julia> @btime update!($a,$b)
1.329 ms (68455 allocations: 1.20 MiB)
```

```
julia> @btime update_thread!($a,$b)
5.796 μs (21 allocations: 2.81 KiB)
```

Why the non-parallelized version allocate far more memory while the parallelized version doesn’t?