That’s a good point… then the speed difference vanishes. I also included an `@inbounds @simd`

version of the loop. The final benchmark is

```
using BenchmarkTools
const N = 10000000
A = randn(N)
sum1(A) = sum(A.*A.+A)
sum2(A) = sum(A.*A+A)
sum3(A) = sum(A[i]*A[i]+A[i] for i in 1:N)
sum4(A) = sum(a*a+a for a in A)
function sum5(A)
res = 0.0
for i = 1:N
res += A[i]*A[i]+A[i]
end
res
end
function sum6(A)
res = 0.0
@inbounds @simd for i = 1:N
res += A[i]*A[i]+A[i]
end
res
end
sum7(A) = sum(x -> x * x + x, A)
BenchmarkTools.DEFAULT_PARAMETERS.samples = 10
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 2
@btime sum1($A)
@btime sum2($A)
@btime sum3($A)
@btime sum4($A)
@btime sum5($A)
@btime sum6($A)
@btime sum7($A)
```

with result (0.6 rc)

```
38.080 ms (2 allocations: 76.29 MiB)
84.583 ms (4 allocations: 152.59 MiB)
12.619 ms (4 allocations: 80 bytes)
12.606 ms (3 allocations: 48 bytes)
12.629 ms (0 allocations: 0 bytes)
7.230 ms (0 allocations: 0 bytes)
7.297 ms (0 allocations: 0 bytes)
```

(neither `@inbounds`

nor `@simd`

by itself is able to get the 7ms, both are needed)

Amazingly, sum(x -> x * x + x, A) is able to SIMD automatically! I don’t understand how that can happen: looking at the implementation, it falls back to mapfoldl_impl, which is a simple while loop…