Broadcast promoting Array{Bool} to BitArray

angusmoore · June 25, 2017, 1:13pm

I’ve just moved over to 0.6. I’m a bit confused about this change in behaviour with respect to bool arrays.

In julia 0.5:

julia> foo = [false,true]
2-element Array{Bool,1}:
false
true

julia> !foo
2-element Array{Bool,1}:
true
false

But in 0.6 the result gets promoted to a bitarray:

julia> foo = [false,true]
2-element Array{Bool,1}:
false
true

julia> .!foo
2-element BitArray{1}:
true
false

Can someone explain why broadcast is promoting Array{Bool}, and if this has been done for a good reason, does that mean I should favor using BitArray over Array{Bool}?

Tamas_Papp · June 25, 2017, 1:32pm

BitArrays have a more compact representation in memory:

julia> a = rand(Bool, 100);

julia> sizeof(a)
100

julia> b = BitArray(rand(Bool) for _ in 1:100);

julia> sizeof(b)
16

so generally they should be preferred.

dpo · June 25, 2017, 2:07pm

That’s good, but shouldn’t foo = [false,true] also generate a BitArray by default then? Is there still a point to using an Array{Bool}?

yuyichao · June 25, 2017, 2:16pm

BitArray is generally slower unless you are memory/bandwidth limitted.

Tamas_Papp · June 25, 2017, 2:34pm

Depends, since extracting elements from a BitArray requires bit manipulation, so it may be slower. Consider this example:

using BenchmarkTools

function sum_indices(a::AbstractArray{<:Bool}, ixs)
    total = 0
    for ix in ixs
        total += a[ix]
    end
    total
end

a = rand(Bool,1000)
b = BitArray(a)
ixs = rand(1:length(a), 10000)

@benchmark sum_indices($a, $ixs)
@benchmark sum_indices($b, $ixs)

On my machine, BitArrays take about 2x the time.

julia> @benchmark sum_indices($a, $ixs)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     7.439 μs (0.00% GC)
  median time:      7.473 μs (0.00% GC)
  mean time:        7.810 μs (0.00% GC)
  maximum time:     32.224 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     4

julia> @benchmark sum_indices($b, $ixs)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     15.720 μs (0.00% GC)
  median time:      15.902 μs (0.00% GC)
  mean time:        16.633 μs (0.00% GC)
  maximum time:     97.815 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

But then note

julia> @benchmark sum($a)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     629.241 ns (0.00% GC)
  median time:      629.629 ns (0.00% GC)
  mean time:        639.860 ns (0.00% GC)
  maximum time:     5.575 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     170

julia> @benchmark sum($b)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     16.611 ns (0.00% GC)
  median time:      16.656 ns (0.00% GC)
  mean time:        17.566 ns (0.00% GC)
  maximum time:     1.001 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

So both are useful, depends on your context. If unsure, benchmark, however, you are best off if you write your code to work with all options and only pay attention to these choices when they affect performance.

angusmoore · June 26, 2017, 2:22am

Thanks guys, that’s very helpful background. But I’m not sure I’ve fully understood why broadcast is promoting Array{Bool} to BitArray. At best it seems a bit inconsistent, and at worst like it could induce type instability. I think this simple (pointless/contrived but hopefully helpful) example illustrates my point:

function flipme(foo, flip)
  if flip
    return .!foo
  else
    return foo
  end
end

I think most people, myself included, would be surprised that this function is type unstable.

yuyichao · June 26, 2017, 2:37am

I’m not a huge fan of returning special array type for Bool eltype but this example is irrelevant. Whatever the return type is there are always input types that make this function type unstable.

angusmoore · June 26, 2017, 4:53am

Thanks, fair point. I guess I was just trying to highlight the somewhat surprising (to me) result of returning a special array type from broadcast.

For reference, for anyone else who comes across this, this change was introduced by #17623. The relevant commentary is (#17623 comment):

One “new” behavior is that broadcast now always produces a BitArray if the output is a Bool array. This preserves the BitArray-producing behavior of expressions like A .== B .> 3.

I must confess I don’t love this changed behavior, but if it’s necessary for fusion, it seems like a very reasonably trade-off. I guess all I need in the corner cases where I want to preserve Array{Bool} is map(!, foo).

Topic		Replies	Views
Why is BitArray so slow? Performance	28	5859	January 14, 2022
Why is the negation of a Boolean array a BitArray General Usage	3	938	September 28, 2019
Hang during conversion New to Julia	6	490	December 16, 2018
Broadcast of .== slow performance, allocations Performance question , performance	5	795	May 16, 2019
How to invert bits in a bit array New to Julia	5	8745	November 11, 2018

Broadcast promoting Array{Bool} to BitArray

Related topics