Performance of map!()

Invarianz · April 5, 2019, 10:07pm

Inspired by this topic here: map vs loops vs broadcasts, I wanted to do my own tests on the speed of the functions. As it turns out the optimised for loop, map and vectorisation are all roughly in the same ballpark.
Now I wanted to play the same game with a predefined result array and found significant differences:

using BenchmarkTools

function forloop(e, v)
    @simd for i in eachindex(v)
        @inbounds e[i] = 2*v[i]^2 + v[i] + 5
    end
end

fmap(e, v) = map!(x -> 2x^2 + x + 5, e, v)
fbcs(e, v) = @. e = 2*v^2 + v + 5

v = rand(10000)
e = similar(v)

@btime for i in 1:100
    forloop(e, v)
end

@btime for i in 1:100
    fmap(e, v)
end

@btime for i in 1:100
    fbcs(e, v)
end

 julia> 336.145 μs (0 allocations: 0 bytes)
        944.702 μs (0 allocations: 0 bytes)
        340.421 μs (0 allocations: 0 bytes)

Am I using map!() correctly or is there a reason why it should be so slow compared to the other implementations?

rdeits · April 6, 2019, 1:22am

Just some general notes about benchmarking:

It’s not necessary to put a loop around the code you want to benchmark (@btime already does that for you)
It is necessary to interpolate (with $) any variables you use inside the code you are benchmarking. Otherwise you’re timing the lookup of a global variable at each function call, which will affect your results (see GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language ). I wouldn’t expect it to affect the relative performance in this case, but it’s still worth getting right.

With that in mind:

julia> @btime forloop($e, $v)
  2.511 μs (0 allocations: 0 bytes)

julia> @btime fmap($e, $v);
  9.699 μs (0 allocations: 0 bytes)

julia> @btime fbcs($e, $v);
  2.611 μs (0 allocations: 0 bytes)

I’m surprised to see that map! is indeed slower as of Julia 1.1.0.

longemen3000 · April 6, 2019, 1:46am

the code for map in a 1D array goes to the function map_n!:

function map_n!(f::F, dest::AbstractArray, As) where F
    for i = LinearIndices(As[1])
        dest[i] = f(ith_all(i, As)...)
    end
    return dest
end

where ith_all:

@inline ith_all(i, as) = (as[1][i], ith_all(i, tail(as))...)

rdeits · April 6, 2019, 2:20am

Looks like an @inbounds in that function could help.

Per · April 6, 2019, 9:40am

Is the ith_all function even needed? I’m guessing it was written before dot-broadcast notation was introduced.

getindex.(As, i) does the same thing as ith_all(i, As) for a tuple of vectors As and scalar index i.

Invarianz · April 6, 2019, 1:53pm

Thanks for the advice regarding the interpolation and the loop. I will certainly use this in my next benchmark run. As you have already shown this does not affect relative performance, so I will keep my implementation in the post as is.
If the performance bottleneck of this is figured out should one open a github issue?

longemen3000 · April 6, 2019, 9:26pm

maybe open a issue directing to this post?

Seif_Shebl · April 7, 2019, 4:55pm

With the --check-bounds=no flag, all versions perform quite similarly (Julia 1.1.0, Windows 10):

julia> @btime forloop($e, $v);
  2.509 μs (0 allocations: 0 bytes)

julia> @btime fmap($e, $v);
  2.623 μs (0 allocations: 0 bytes)

julia> @btime fbcs($e, $v);
  2.537 μs (0 allocations: 0 bytes)

suiato · August 23, 2019, 3:16am

While wondering if I should use map or for-loop in a program, my elementary test below showed that map was much slower than for-loop, even when I used no-check-bounds kernel option in IJulia by doing

julia> using IJulia
julia> installkernel("Julia no-check-bounds", "--check-bounds=no")

Did I do it wrong somewhere below?

using BenchmarkTools

@benchmark map((x)->x^2,1:5)

BenchmarkTools.Trial: 
  memory estimate:  128 bytes
  allocs estimate:  1
  --------------
  minimum time:     37.563 ns (0.00% GC)
  median time:      41.289 ns (0.00% GC)
  mean time:        52.538 ns (16.17% GC)
  maximum time:     46.484 μs (99.75% GC)
  --------------
  samples:          10000
  evals/sample:     993

@benchmark for x in 1:5
    (x)->x^2
end

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.599 ns (0.00% GC)
  median time:      1.700 ns (0.00% GC)
  mean time:        1.740 ns (0.00% GC)
  maximum time:     14.201 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

versioninfo()

Julia Version 1.0.4
Commit 38e9fb7f80 (2019-05-16 03:38 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-3380M CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)
Environment:
  JULIA_NUM_THREADS = 4

tkoolen · August 23, 2019, 3:26am

Those don’t do the same thing. In the second benchmark, you’re repeatedly creating a function x -> x^2, not evaluating it.

suiato · August 23, 2019, 3:56am

Right. Thank you. I created a function temp() to evaluate it as below. The results below show that map! is still slower.

y .= y.^2

is the fastest, although it may not be an example to be compared with the others.

using BenchmarkTools

y = zeros(5);

@benchmark map!(x->x^2, y, 1:5)

BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     30.986 ns (0.00% GC)
  median time:      31.591 ns (0.00% GC)
  mean time:        44.111 ns (17.76% GC)
  maximum time:     48.684 μs (99.85% GC)
  --------------
  samples:          10000
  evals/sample:     994

function temp(y)
    for x in 1:5
        y[x] = x^2
    end
end

temp (generic function with 1 method)

y = zeros(5);

@benchmark temp(y)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     16.432 ns (0.00% GC)
  median time:      16.533 ns (0.00% GC)
  mean time:        19.683 ns (0.00% GC)
  maximum time:     64.629 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

y = collect(1:5);

@benchmark y .= y.^2

BenchmarkTools.Trial: 
  memory estimate:  64 bytes
  allocs estimate:  4
  --------------
  minimum time:     883.784 ns (0.00% GC)
  median time:      900.027 ns (0.00% GC)
  mean time:        975.177 ns (3.75% GC)
  maximum time:     367.854 μs (99.37% GC)
  --------------
  samples:          10000
  evals/sample:     37

tkoolen · August 23, 2019, 4:03am

julia> @btime $y .= $y.^2
  6.818 ns (0 allocations: 0 bytes)

See e.g. Function calls in global scope, benchmarking, etc - #2 by mbauman.

suiato · August 23, 2019, 4:06am

Thank you, again. Learning a lot here

It should’ve been:

using BenchmarkTools

y = collect(1:5);
@benchmark $y .= $y.^2

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     9.509 ns (0.00% GC)
  median time:      9.511 ns (0.00% GC)
  mean time:        9.927 ns (0.00% GC)
  maximum time:     86.085 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

Topic		Replies	Views
Map vs Loops & Array Comprehensions in Julia 1.0 Performance	10	5782	March 3, 2021
Performance of loops New to Julia broadcast , loops , aliasing	30	806	August 19, 2024
Maximum!(v, A) is slower than own implementation? Performance	2	653	March 5, 2018
When should I write loops or vectorised calls? General Usage	17	1768	December 1, 2020
Performance regression General Usage question	3	760	December 8, 2016

Performance of map!()

Related topics