```
using BenchmarkTools
using LoopVectorization
function a1(x)
y = zero(eltype(x))
for i in eachindex(x)
y += x[i]
end
y
end
function a2(x)
y = zero(eltype(x))
@simd for i in eachindex(x)
y += x[i]
end
y
end
function a3(x)
y = zero(eltype(x))
@inbounds @fastmath for i in eachindex(x)
y += x[i]
end
return y
end
function a4(x)
y = zero(eltype(x))
@inbounds @fastmath @simd for i in eachindex(x)
y += x[i]
end
return y
end
function a5(x)
y = zero(eltype(x))
@turbo for i in eachindex(x)
y += x[i]
end
return y
end
x = rand(100_000_000);
@benchmark a1($x)
@benchmark a2($x)
@benchmark a3($x)
@benchmark a4($x)
@benchmark a5($x)
```

and I got these results which show that `@simd`

and `@inbounds @fastmath`

are similar and donβt really add much to it. Perhaps, the compiler was smart enough to compile a few things away or the algorithm doesnβt lend well to SIMD. So not really use whatβs the best general approach to speed up array processing code. My real code is a lot more complicated and involves calling some functions the in the reduce step

## Summary

```
BenchmarkTools.Trial: 50 samples with 1 evaluation.
Range (min β¦ max): 98.881 ms β¦ 108.322 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 100.404 ms β GC (median): 0.00%
Time (mean Β± Ο): 101.257 ms Β± 2.343 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β
βββ
ββ
ββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
98.9 ms Histogram: frequency by time 108 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 141 samples with 1 evaluation.
Range (min β¦ max): 34.740 ms β¦ 40.694 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 35.185 ms β GC (median): 0.00%
Time (mean Β± Ο): 35.555 ms Β± 1.002 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββ
ββββββββββ
β
βββ
ββββ
βββββββββββββββββββββββββββββββββββββββββ β
34.7 ms Histogram: frequency by time 40.5 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 138 samples with 1 evaluation.
Range (min β¦ max): 34.956 ms β¦ 43.002 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 36.081 ms β GC (median): 0.00%
Time (mean Β± Ο): 36.368 ms Β± 1.128 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β β βββ
β
β
β
ββββ
β
ββ
βββββββββ
ββββ
β
ββββββββββββββββββββββββββββββββββββ β
35 ms Histogram: frequency by time 40.9 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 140 samples with 1 evaluation.
Range (min β¦ max): 34.972 ms β¦ 45.533 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 35.587 ms β GC (median): 0.00%
Time (mean Β± Ο): 35.909 ms Β± 1.382 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββ β
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
35 ms Histogram: frequency by time 44.3 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 140 samples with 1 evaluation.
Range (min β¦ max): 34.838 ms β¦ 41.175 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 35.573 ms β GC (median): 0.00%
Time (mean Β± Ο): 35.935 ms Β± 1.116 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββββββ
ββββββββββββ
βββββ
ββββββββββββββββββββββββββββββββββββββββββ β
34.8 ms Histogram: frequency by time 40.9 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
```