Adding large matrix, why .+ is not the default?

Take three large matrix A, B, C each with 1000x1000 elements. Here is the result for two benchmarks:

  • using broadcasted operator “.+”
@benchmark A.=A.+B.+C
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  4
  --------------
  minimum time:     1.866 ms (0.00% GC)
  median time:      1.978 ms (0.00% GC)
  mean time:        2.004 ms (0.00% GC)
  maximum time:     2.771 ms (0.00% GC)
  --------------
  samples:          2474
  evals/sample:     1
  • using the regular operator “+”
@benchmark A.=A+B+C
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  4
  --------------

  minimum time:     5.665 ms (0.00% GC)
  median time:      5.981 ms (0.00% GC)
  mean time:        6.917 ms (12.94% GC)
  maximum time:     35.410 ms (39.02% GC)
  --------------
  samples:          722
  evals/sample:     1

Considering that the broadcasted versions is 5 to 16 times faster, why it is not the default? Why do we even need a “+” operator for matrices if the “.+” is faster? Is there any reason or situation where I would have faster/more stable code if I wrote “+” instead of “.+”? Otherwise, It just seems silly that I need to write “.+” (or use a macro) every time I am doing a matrix addition.

As a remark, I have checked that if I do not allocate the result, i.e. I run

@benchmark A+B+C

and

@benchmark A.+B.+C

the runing time is exactly the same, which puzzles me even more about what is happening.

In the first case, the calculation is broadcasted elementwise, including the step of writing into A. In the second case, the entire right side is evaluated, allocating a temporary matrix, which is then written elementwise into A. You can tell this is the case based on the allocations. Also note that benchmarking in the global scope is tricky and requires variable interpolation with $ to return accurate results. In the first case, there should be no allocation at all.

1 Like

See More Dots: Syntactic Loop Fusion in Julia

5 Likes