Adding large matrix, why .+ is not the default?

raphaelchinchilla · August 10, 2020, 8:40pm

Take three large matrix A, B, C each with 1000x1000 elements. Here is the result for two benchmarks:

using broadcasted operator “.+”

@benchmark A.=A.+B.+C
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  4
  --------------
  minimum time:     1.866 ms (0.00% GC)
  median time:      1.978 ms (0.00% GC)
  mean time:        2.004 ms (0.00% GC)
  maximum time:     2.771 ms (0.00% GC)
  --------------
  samples:          2474
  evals/sample:     1

using the regular operator “+”

@benchmark A.=A+B+C
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  4
  --------------

  minimum time:     5.665 ms (0.00% GC)
  median time:      5.981 ms (0.00% GC)
  mean time:        6.917 ms (12.94% GC)
  maximum time:     35.410 ms (39.02% GC)
  --------------
  samples:          722
  evals/sample:     1

Considering that the broadcasted versions is 5 to 16 times faster, why it is not the default? Why do we even need a “+” operator for matrices if the “.+” is faster? Is there any reason or situation where I would have faster/more stable code if I wrote “+” instead of “.+”? Otherwise, It just seems silly that I need to write “.+” (or use a macro) every time I am doing a matrix addition.

raphaelchinchilla · August 10, 2020, 8:51pm

As a remark, I have checked that if I do not allocate the result, i.e. I run

@benchmark A+B+C

and

@benchmark A.+B.+C

the runing time is exactly the same, which puzzles me even more about what is happening.

tomerarnon · August 10, 2020, 8:58pm

In the first case, the calculation is broadcasted elementwise, including the step of writing into A. In the second case, the entire right side is evaluated, allocating a temporary matrix, which is then written elementwise into A. You can tell this is the case based on the allocations. Also note that benchmarking in the global scope is tricky and requires variable interpolation with $ to return accurate results. In the first case, there should be no allocation at all.

stevengj · August 10, 2020, 9:10pm

See More Dots: Syntactic Loop Fusion in Julia

Topic		Replies	Views
Why is dot plus faster? Performance question	6	515	May 28, 2022
Broadcasting inconsistency between addition and multiplication New to Julia broadcasting	13	1334	October 8, 2021
Broadcasting across a nested array General Usage broadcasting	47	4070	March 14, 2019
Which is the most efficient way to add matrices/arrays in Julia 1.x? Performance matrices , julia-1x	11	2118	March 26, 2021
Why is a manual in-place addition so much faster than += and .+= on range-indexed arrays? General Usage	3	5001	April 21, 2017

Adding large matrix, why .+ is not the default?

Related topics