Performance issues with Boolean dot operators


#1

I’m seeing poor performance in Boolean dot operators, such as .& and .!. Please see below. Any ideas why?

Thanks!

julia> using BenchmarkTools

julia> x, y = rand(Bool,100), rand(Bool,100)
(Bool[false, true, true, true, false, true, true, false, false, true  …  true, true, true, false, true, false, false, false, false, false], Bool[true, false, true, true, false, false, false, true, true, false  …  true, false, false, true, false, true, true, true, true, false])

julia> not(v) = [!x for x in v]
not (generic function with 1 method)

julia> and(x,y)=[x[i] && y[i] for i in eachindex(x)]
and (generic function with 1 method)

julia> @benchmark .!x
BenchmarkTools.Trial:
  memory estimate:  4.92 KiB
  allocs estimate:  19
  --------------
  minimum time:     4.333 μs (0.00% GC)
  median time:      4.889 μs (0.00% GC)
  mean time:        5.833 μs (7.18% GC)
  maximum time:     340.020 μs (95.77% GC)
  --------------
  samples:          10000
  evals/sample:     7

julia> @benchmark not(x)
BenchmarkTools.Trial:
  memory estimate:  208 bytes
  allocs estimate:  2
  --------------
  minimum time:     164.375 ns (0.00% GC)
  median time:      167.230 ns (0.00% GC)
  mean time:        184.198 ns (2.59% GC)
  maximum time:     1.307 μs (74.39% GC)
  --------------
  samples:          10000
  evals/sample:     759

julia> @benchmark x .& y
BenchmarkTools.Trial:
  memory estimate:  4.95 KiB
  allocs estimate:  20
  --------------
  minimum time:     4.803 μs (0.00% GC)
  median time:      5.271 μs (0.00% GC)
  mean time:        6.114 μs (5.39% GC)
  maximum time:     331.211 μs (96.50% GC)
  --------------
  samples:          10000
  evals/sample:     7

julia> @benchmark and(x,y)
BenchmarkTools.Trial:
  memory estimate:  464 bytes
  allocs estimate:  8
  --------------
  minimum time:     552.909 ns (0.00% GC)
  median time:      567.124 ns (0.00% GC)
  mean time:        630.907 ns (4.59% GC)
  maximum time:     12.199 μs (87.91% GC)
  --------------
  samples:          10000
  evals/sample:     186

#2

Be careful about benchmarking with global variables. The simplest way to avoid this is to use $ interpolation with BenchmarkTools:

julia> @btime x .& y;
  9.842 μs (20 allocations: 4.95 KiB)

julia> @btime $x .& $y;
  1.161 μs (4 allocations: 4.33 KiB)

julia> @btime and($x,$y);
  465.760 ns (3 allocations: 256 bytes)

It still seems quite a bit worse than an explicit loop, unfortunately. And, actually, your and function is slower than necessary because it uses short-circuit && rather than bitwise &. With the latter, the explicit loop (comprehension) is even faster:

julia> and2(x,y)=[x[i] & y[i] for i in eachindex(x)]
and2 (generic function with 1 method)

julia> @btime and2($x,$y);
  210.676 ns (3 allocations: 256 bytes)

#3

I’m not sure why it’s so much slower, or why the memory allocations are so high, but one difference is that .!(x) returns a BitArray, which stores boolean data very compactly (one element per bit), whereas your not(x) function returns an Array{Bool} which actually stores one byte per array element. The BitArray uses much less memory, but can be slower to access because an individual element cannot be looked up without doing some bit manipulation.

julia> typeof(not(x))
Array{Bool,1}

julia> typeof(.!(x))
BitArray{1}

julia> sizeof(not(x))
100  # 100 bytes for 100 elements

julia> sizeof(.!(x))
16   # only 16 bytes to store 100 elements

#4

Update: Other operators, such as .< also seem to have the same issue.


#5

To see if it’s just the difference in array type (BitArray vs. Vector{Bool}), try a .= x .& y on a = similar(x) (in a function).


#6

I think that’s it! Thank you!

julia> x,y=rand(Bool,100),rand(Bool,100);

julia> using BenchmarkTools

julia> f(x,y) = (a=similar(x); a.=x.&y)
f (generic function with 1 method)

julia> g(x,y) = [x[i] & y[i] for i in eachindex(x)]
g (generic function with 1 method)

julia> @btime f($x,$y);
  38.625 ns (1 allocation: 192 bytes)

julia> @btime g($x,$y);
  115.981 ns (3 allocations: 256 bytes)

julia> @btime $x .& $y;
  749.807 ns (4 allocations: 4.33 KiB)