# Performance of map!()

Inspired by this topic here: map vs loops vs broadcasts, I wanted to do my own tests on the speed of the functions. As it turns out the optimised for loop, map and vectorisation are all roughly in the same ballpark.
Now I wanted to play the same game with a predefined result array and found significant differences:

``````using BenchmarkTools

function forloop(e, v)
@simd for i in eachindex(v)
@inbounds e[i] = 2*v[i]^2 + v[i] + 5
end
end

fmap(e, v) = map!(x -> 2x^2 + x + 5, e, v)
fbcs(e, v) = @. e = 2*v^2 + v + 5

v = rand(10000)
e = similar(v)

@btime for i in 1:100
forloop(e, v)
end

@btime for i in 1:100
fmap(e, v)
end

@btime for i in 1:100
fbcs(e, v)
end
``````
`````` julia> 336.145 μs (0 allocations: 0 bytes)
944.702 μs (0 allocations: 0 bytes)
340.421 μs (0 allocations: 0 bytes)
``````

Am I using map!() correctly or is there a reason why it should be so slow compared to the other implementations?

1 Like

Just some general notes about benchmarking:

1. It’s not necessary to put a loop around the code you want to benchmark (`@btime` already does that for you)
2. It is necessary to interpolate (with `\$`) any variables you use inside the code you are benchmarking. Otherwise you’re timing the lookup of a global variable at each function call, which will affect your results (see GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language ). I wouldn’t expect it to affect the relative performance in this case, but it’s still worth getting right.

With that in mind:

``````julia> @btime forloop(\$e, \$v)
2.511 μs (0 allocations: 0 bytes)

julia> @btime fmap(\$e, \$v);
9.699 μs (0 allocations: 0 bytes)

julia> @btime fbcs(\$e, \$v);
2.611 μs (0 allocations: 0 bytes)
``````

I’m surprised to see that `map!` is indeed slower as of Julia 1.1.0.

4 Likes

the code for map in a 1D array goes to the function map_n!:

``````function map_n!(f::F, dest::AbstractArray, As) where F
for i = LinearIndices(As[1])
dest[i] = f(ith_all(i, As)...)
end
return dest
end
``````

where `ith_all`:

``````@inline ith_all(i, as) = (as[1][i], ith_all(i, tail(as))...)
``````

Looks like an `@inbounds` in that function could help.

Is the `ith_all` function even needed? I’m guessing it was written before dot-broadcast notation was introduced.

`getindex.(As, i)` does the same thing as `ith_all(i, As)` for a tuple of vectors `As` and scalar index `i`.

Thanks for the advice regarding the interpolation and the loop. I will certainly use this in my next benchmark run. As you have already shown this does not affect relative performance, so I will keep my implementation in the post as is.
If the performance bottleneck of this is figured out should one open a github issue?

maybe open a issue directing to this post?

With the `--check-bounds=no` flag, all versions perform quite similarly (Julia 1.1.0, Windows 10):

``````julia> @btime forloop(\$e, \$v);
2.509 μs (0 allocations: 0 bytes)

julia> @btime fmap(\$e, \$v);
2.623 μs (0 allocations: 0 bytes)

julia> @btime fbcs(\$e, \$v);
2.537 μs (0 allocations: 0 bytes)
``````
2 Likes

While wondering if I should use map or for-loop in a program, my elementary test below showed that map was much slower than for-loop, even when I used no-check-bounds kernel option in IJulia by doing

``````julia> using IJulia
julia> installkernel("Julia no-check-bounds", "--check-bounds=no")
``````

Did I do it wrong somewhere below?

``````using BenchmarkTools
``````
``````@benchmark map((x)->x^2,1:5)
``````
``````BenchmarkTools.Trial:
memory estimate:  128 bytes
allocs estimate:  1
--------------
minimum time:     37.563 ns (0.00% GC)
median time:      41.289 ns (0.00% GC)
mean time:        52.538 ns (16.17% GC)
maximum time:     46.484 μs (99.75% GC)
--------------
samples:          10000
evals/sample:     993
``````
``````@benchmark for x in 1:5
(x)->x^2
end
``````
``````BenchmarkTools.Trial:
memory estimate:  0 bytes
allocs estimate:  0
--------------
minimum time:     1.599 ns (0.00% GC)
median time:      1.700 ns (0.00% GC)
mean time:        1.740 ns (0.00% GC)
maximum time:     14.201 ns (0.00% GC)
--------------
samples:          10000
evals/sample:     1000
``````
``````versioninfo()
``````
``````Julia Version 1.0.4
Commit 38e9fb7f80 (2019-05-16 03:38 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i5-3380M CPU @ 2.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)
Environment:
``````
``````
``````

Those don’t do the same thing. In the second benchmark, you’re repeatedly creating a function `x -> x^2`, not evaluating it.

Right. Thank you. I created a function temp() to evaluate it as below. The results below show that map! is still slower.

y .= y.^2

is the fastest, although it may not be an example to be compared with the others.

``````using BenchmarkTools
``````
``````y = zeros(5);
``````
``````@benchmark map!(x->x^2, y, 1:5)
``````
``````BenchmarkTools.Trial:
memory estimate:  32 bytes
allocs estimate:  1
--------------
minimum time:     30.986 ns (0.00% GC)
median time:      31.591 ns (0.00% GC)
mean time:        44.111 ns (17.76% GC)
maximum time:     48.684 μs (99.85% GC)
--------------
samples:          10000
evals/sample:     994
``````
``````function temp(y)
for x in 1:5
y[x] = x^2
end
end
``````
``````temp (generic function with 1 method)
``````
``````y = zeros(5);
``````
``````@benchmark temp(y)
``````
``````BenchmarkTools.Trial:
memory estimate:  0 bytes
allocs estimate:  0
--------------
minimum time:     16.432 ns (0.00% GC)
median time:      16.533 ns (0.00% GC)
mean time:        19.683 ns (0.00% GC)
maximum time:     64.629 ns (0.00% GC)
--------------
samples:          10000
evals/sample:     998
``````
``````y = collect(1:5);
``````
``````@benchmark y .= y.^2
``````
``````BenchmarkTools.Trial:
memory estimate:  64 bytes
allocs estimate:  4
--------------
minimum time:     883.784 ns (0.00% GC)
median time:      900.027 ns (0.00% GC)
mean time:        975.177 ns (3.75% GC)
maximum time:     367.854 μs (99.37% GC)
--------------
samples:          10000
evals/sample:     37
``````
``````julia> @btime \$y .= \$y.^2
6.818 ns (0 allocations: 0 bytes)
``````
1 Like

Thank you, again. Learning a lot here

It should’ve been:

``````using BenchmarkTools
``````
``````y = collect(1:5);
@benchmark \$y .= \$y.^2
``````
``````BenchmarkTools.Trial:
memory estimate:  0 bytes
allocs estimate:  0
--------------
minimum time:     9.509 ns (0.00% GC)
median time:      9.511 ns (0.00% GC)
mean time:        9.927 ns (0.00% GC)
maximum time:     86.085 ns (0.00% GC)
--------------
samples:          10000
evals/sample:     999
``````