Inspired by this topic here: map vs loops vs broadcasts, I wanted to do my own tests on the speed of the functions. As it turns out the optimised for loop, map and vectorisation are all roughly in the same ballpark.

Now I wanted to play the same game with a predefined result array and found significant differences:

```
using BenchmarkTools
function forloop(e, v)
@simd for i in eachindex(v)
@inbounds e[i] = 2*v[i]^2 + v[i] + 5
end
end
fmap(e, v) = map!(x -> 2x^2 + x + 5, e, v)
fbcs(e, v) = @. e = 2*v^2 + v + 5
v = rand(10000)
e = similar(v)
@btime for i in 1:100
forloop(e, v)
end
@btime for i in 1:100
fmap(e, v)
end
@btime for i in 1:100
fbcs(e, v)
end
```

```
julia> 336.145 μs (0 allocations: 0 bytes)
944.702 μs (0 allocations: 0 bytes)
340.421 μs (0 allocations: 0 bytes)
```

Am I using map!() correctly or is there a reason why it should be so slow compared to the other implementations?