Hello!

I am using `TimerOutputs.jl`

to time the performance of specific componenets, which to be clear is really awesome and great. It has made me aware of how long I actually spend setting values back to zero:

```
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 745s / 98.4% 2.52GiB / 81.4%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────────────
08 Second NeighborLoop 12.4k 252s 34.4% 20.3ms 120MiB 5.7% 9.91KiB
04 First NeighborLoop 12.4k 246s 33.5% 19.8ms 118MiB 5.6% 9.75KiB
07B ResetArrays 24.8k 39.4s 5.4% 1.59ms 1.15MiB 0.1% 48.4B
03B ResetArrays 24.8k 38.2s 5.2% 1.54ms 1.14MiB 0.1% 48.0B
04 Reduction 49.6k 34.2s 4.7% 690μs 248MiB 11.8% 5.12KiB
08 Reduction 49.6k 33.9s 4.6% 683μs 248MiB 11.8% 5.12KiB
05 Update To Half TimeStep 12.4k 12.7s 1.7% 1.03ms 1.97KiB 0.0% 0.16B
02 Calculate IndexCounter 346 12.5s 1.7% 36.0ms 4.09MiB 0.2% 12.1KiB
11 Update To Final TimeStep 12.4k 12.0s 1.6% 971μs 2.39KiB 0.0% 0.20B
01 Update TimeStep 12.4k 11.5s 1.6% 924μs 2.95KiB 0.0% 0.24B
13 Next TimeStep 12.4k 9.24s 1.3% 745μs 69.3MiB 3.3% 5.72KiB
07A ResetArrays 24.8k 8.73s 1.2% 352μs 1.16KiB 0.0% 0.05B
03A ResetArrays 24.8k 8.69s 1.2% 350μs 1.14KiB 0.0% 0.05B
12B Close hdfvtk output files 1 4.07s 0.6% 4.07s 4.86KiB 0.0% 4.86KiB
10 Final Density 12.4k 2.83s 0.4% 228μs 640B 0.0% 0.05B
12A Output Data 250 2.45s 0.3% 9.81ms 1.24GiB 60.4% 5.06MiB
09 Final LimitDensityAtBoundary 12.4k 1.95s 0.3% 157μs 512B 0.0% 0.04B
06 Half LimitDensityAtBoundary 12.4k 1.67s 0.2% 134μs 32.0B 0.0% 0.00B
XX Move 24.8k 1.48s 0.2% 59.5μs 288B 0.0% 0.01B
XX Calculate Force 250 43.2ms 0.0% 173μs 21.9MiB 1.0% 89.5KiB
────────────────────────────────────────────────────────────────────────────────────────────
```

Looking at `03B ResetArrays`

and `07B ResetArrays`

it is clear that I am spending over 1 minute in a 12 minute simulation, setting values back to zero. For reduction, `04 Reduction`

and `08 Reduction`

it also equals about a minute. So in total 1/6th of my simulatiom time is spend resetting arrays to zero and reducing arrays to get the final answer.

The reason I have to reset is that my looping is based on `+=`

simulation values each time step and the reason for having to perform the reduction is multi-threaded approach using `ChunkSplitters.jl`

.

It is hard to provide a MWE it is really nested code in a package I am cleaning up, but I am posting here to hear some other experiences from others who have also written simulations in Julia.

Kind regards