Why is a multi-argument inplace map much faster in this case than a broadcast?

jishnub · December 12, 2022, 5:12am

I can confirm that this is system dependent.

julia> A = rand(10, 1000); B = copy(A); C = zero(A); D = zero(A);

julia> @btime map!(+, $C, $A, $B);
  6.505 μs (0 allocations: 0 bytes)

julia> @btime $D .= $A .+ $B;
  6.724 μs (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  JULIA_EDITOR = subl

On another system:

julia> A = rand(10, 1000); B = copy(A); C = zero(A); D = zero(A);

julia> @btime map!(+, $C, $A, $B);
  8.947 μs (0 allocations: 0 bytes)

julia> @btime $D .= $A .+ $B;
  12.092 μs (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD EPYC 7742 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 1 on 64 virtual cores
Environment:
  JULIA_EDITOR = vi

On yet another

julia> A = rand(10, 1000); B = copy(A); C = zero(A); D = zero(A);

julia> @btime map!(+, $C, $A, $B);
  10.730 μs (0 allocations: 0 bytes)

julia> @btime $D .= $A .+ $B;
  12.526 μs (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 28 × Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
  Threads: 1 on 28 virtual cores

I wish this wasn’t the case, as this makes it difficult to write performant code. In general, though, map! does appear to be faster for wide matrices.

Topic		Replies	Views
Broadcast with >3 parameters 30x slower Performance	6	597	November 5, 2020
`map` vs `broadcast`: should one prefer `map` if these are equivalent? Performance package , broadcast , map	3	1464	August 23, 2022
Why is identity on an Any vector so much slower when broadcasting than when mapped? New to Julia performance , broadcasting	7	443	March 7, 2023
Why is broadcast faster than the dot syntax? (Performance differences between @., ., broadcast and broadcast!) Performance broadcast , syntax , broadcasting	5	1318	January 23, 2021
Sum, mapreduce and broadcasted Performance broadcast	14	2911	September 23, 2018

Why is a multi-argument inplace map much faster in this case than a broadcast?

Related topics