.= vs = speed difference

Hi All,
I have a question about what goes on “under the hood” for these two @benchmark calls

julia> A = rand([true, false],100); B = rand([true, false],100);

julia> @benchmark C = A .& B
BenchmarkTools.Trial: 
  memory estimate:  4.36 KiB
  allocs estimate:  5
  --------------
  minimum time:     754.330 ns (0.00% GC)
  median time:      1.066 μs (0.00% GC)
  mean time:        1.423 μs (29.76% GC)
  maximum time:     600.483 μs (99.71% GC)
  --------------
  samples:          10000
  evals/sample:     115

julia> @benchmark (C = similar(A); C .= A .& B)
BenchmarkTools.Trial: 
  memory estimate:  240 bytes
  allocs estimate:  3
  --------------
  minimum time:     329.578 ns (0.00% GC)
  median time:      354.242 ns (0.00% GC)
  mean time:        424.331 ns (11.63% GC)
  maximum time:     283.793 μs (99.84% GC)
  --------------
  samples:          10000
  evals/sample:     225

Does the second @benchmark run faster because the pre-allocation step is easy and the .= allows for very quick allocation?

No. A .& B is slower than similar(A) .= A .& B because the former by default produces a BitArray whereas in the latter case you explicitly told it to use the same Vector{Bool} type as A. A BitArray is a more compact way to represent an array of boolean values (one bit per value vs. one byte per value for Vector{Bool}, but it is slower to access the individual values.

On the other hand, if you change A and B to both be BitVector as well (bA, bB = BitVector(A), BitVector(B)), then both cases run at the same speed and are about twice as fast as similar(A) .= A .& B on my machine. That’s because .& on BitVector arguments can operate on 64-bit chunks of values at a time). The improvement is even greater for longer vectors (100 is rather short).

3 Likes

Ohhh, it all makes sense now! Thanks so much!