Why is dot plus faster?

thisismygitrepo · May 28, 2022, 11:24am

I’m experimenting with differnt expression and I can’t explain why the dot + is faster than just + in the simple case of matrix addition.

using BenchmarkTools 
A = rand(100, 100) 
B = rand(100, 100) 

function no_dot(A, B)
    return [A+B for i in 1:10]
end

function with_dot(A, B)
    return [A.+B for i in 1:10]
end

@btime no_dot(A,B)
@btime with_dot(A, B)

The second is 10% faster.

Karthik-d-k · May 28, 2022, 11:32am

For me, i get the same performance and also i checked @code_native →
both functions generates the same code…!!

skleinbo · May 28, 2022, 11:32am

Is it?

julia> @benchmark no_dot(A,B)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   61.900 μs …   6.048 ms  ┊ GC (min … max):  0.00% … 93.50%
 Time  (median):      78.600 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   128.704 μs ± 181.175 μs  ┊ GC (mean ± σ):  11.31% ±  9.80%

  ██▄▄▂▁               ▃▃▂▂▁                                    ▂
  ████████▇▇▇▇▆▆▅▆▆▆▆▅▇███████▇▇▆▆▅▅▅▃▄▄▃▁▃▁▁▁▁▃▃▃▁▃▅▅▄▆▅▆▆▅▆▆▅ █
  61.9 μs       Histogram: log(frequency) by time          1 ms <

 Memory estimate: 781.84 KiB, allocs estimate: 21.

julia> @benchmark with_dot(A,B)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   62.500 μs …   5.696 ms  ┊ GC (min … max):  0.00% … 93.94%
 Time  (median):      78.600 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   108.850 μs ± 146.224 μs  ┊ GC (mean ± σ):  11.28% ±  9.42%

  ▇█▄▃▂▁                  ▁▁                                    ▂
  ███████▇█▆▇▇▆▇▅▆▅▅▆▅▅▅▆▇██▇▇▅▆▅▅▅▁▄▅▅▅▄▄▄▁▄▁▄▃▄▃▃▄▅▅▆▅▅▅▅▆▅▅▅ █
  62.5 μs       Histogram: log(frequency) by time        900 μs <

 Memory estimate: 781.84 KiB, allocs estimate: 21.

Shouldn’t be any significant difference, because A+B broadcasts internally.

thisismygitrepo · May 28, 2022, 11:39am

Thanks for the check, perhaps I should remove the post cause it was confusing even for me.

Palli · May 28, 2022, 12:10pm

Just for you, I suggested (because it’s a tricky situation, 10% can easily be noise, loaded computer):
https://github.com/JuliaCI/BenchmarkTools.jl/issues/283

Ok, I didn’t for you specifically, but looking up issues under my name I didn’t find it at first, so was actually going to do it for you…, could have sworn I had suggested in the past, which I actually had. Not sure why the search failed the first time.

I did notice the interesting: “Feature suggestion: @ballocs” for some reason I read that as bollocks (would it read the same in English?).

jling · May 28, 2022, 12:24pm

https://juliaci.github.io/BenchmarkTools.jl/dev/manual/#TrialRatio-and-TrialJudgement

Palli · May 28, 2022, 12:57pm

Ok, I think you’re saying already possible (just complicated). See link, but about implementing something better/user-friendly see the open issue. Let’s not discuss new implementation here, to not split the discussion. [Would be ok to post here how do do this exactly, with as-is tools.]

Topic		Replies	Views
Why is broadcast faster than the dot syntax? (Performance differences between @., ., broadcast and broadcast!) Performance broadcast , syntax , broadcasting	5	1321	January 23, 2021
Naive dot product faster in Fortran than in Juila Performance	12	1403	July 24, 2021
Dot(x, inv.(y)) vs non-allocating one, the former is faster? Performance	5	641	March 2, 2019
Performance difference between two code New to Julia	3	782	January 18, 2017
Arithmetic performance of expression Performance	11	371	October 4, 2022

Why is dot plus faster?

Related topics