Why is BitArray so slow?

mbeach42 · August 31, 2018, 7:25pm

I was playing around with flipping numbers of an array, and I was surprised that doing so with BitArrays is much slower. I think the code below is pretty clear,

using BenchmarkTools
using Random

a = rand(-1:2:1, 26, 11)
c = bitrand(26, 11)
b = 2 * c .- 1

function loop(a, b)
    for i in eachindex(a)
        if b[i] == 1
            a[i] *= -1 
        end
    end
end

function loop2(a, c)
    for i in eachindex(a)
        if c[i] == true
            a[i] *= -1
        end
    end
end

    
loop(a, b)
loop2(a, c)

@btime loop(a, b);
@btime loop2(a, c);

loop(a,b) gives me about 55 ns, and loop2(a,c) gives about 200 ns.

Why would this happen? And what is the absolute fastest way to do this loop?

mbeach42 · August 31, 2018, 7:48pm

I guess this comes down to a speed difference with using Int64’s vs any other kind of Int, including bits.

yuyichao · August 31, 2018, 7:49pm

Unless you are memory bound (which usually isn’t the case for non-vectorized loops), bitarray will almost always be slower. You can do things better if you operate directly on the underlying array.

AFAICT the bitarray type doesn’t provide a public generic interface to operate on the underlying array. Fortunately, there aren’t maany operations that can benefit from this since there are only two values each element can take. The particular thing you want to do should be doable with map!(!, a, a).

foobar_lv2 · August 31, 2018, 8:03pm

I cannot reproduce. That being said, CMOV is your friend:

julia> N=10_000; a = rand(-1:2:1, N); c=bitrand(N); b = 2 * c .- 1; d=collect(c);
julia> function loop3(a,c)
       @simd for i in eachindex(a)
       @inbounds a[i]=ifelse(c[i], -a[i], a[i])
       end
       nothing
       end
julia> @btime loop(a,b);
  56.643 μs (0 allocations: 0 bytes)
julia> @btime loop2(a,c);
  61.410 μs (0 allocations: 0 bytes)
julia> @btime loop3(a,c);
  13.170 μs (0 allocations: 0 bytes)
julia> @btime loop3(a,d);
  4.960 μs (0 allocations: 0 bytes)

yuyichao · August 31, 2018, 8:03pm

FWIW, @simd is illegal with BitArray.

yuyichao · August 31, 2018, 8:11pm

Actually, after `@simd` breaks generic code by vchuravy · Pull Request #27670 · JuliaLang/julia · GitHub, @simd should be fine, @simd ivdep isn’t.

mbauman · August 31, 2018, 8:11pm

Parallelization is really only bad if you’re assigning into BitArrays, and even then we’ve changed the semantics of @simd such that it doesn’t perform the bad transform we have run into in the past.

BitArrays are a huge win when you can exploit their Int64 storage and not need to worry about picking out individual bits. Broadcasting now does this with the basic logic operators when possible. Or it can also be a slight win if the size difference is enough to keep you in cache. In any case, it’s an easy place to benchmark and profile between the two implementations if it’s particularly important.

anon94023334 · August 31, 2018, 9:13pm

They’re also a huge win if you can fit the (parts of the) array (you need) entirely into cache, where you might not be able to otherwise. We’ve seen in LightGraphs that the improvements in performance due to keeping bitarrays in cache outweigh the performance improvements for a vector of booleans for very large graphs where we’re keeping track of whether a vertex has been visited.

Seif_Shebl · August 31, 2018, 9:52pm

Yet another weird performance of an alternative function. The following loop3(a,b) doesn’t use conditions at all, but relies on accessing all the indices of a and b in memory order. For small a and b, this is slower than loop(a,b) which uses conditions, but as a and b become large enough, they have the same performance. Is accessing an array element in memory order more expensive than a conditional?

function loop3(a,b)
    for i in eachindex(a) 
        a[i] *= -b[i]
    end  
    a
end

mbeach42 · September 5, 2018, 12:02am

this is great! thanks, but it’s still more than 4x faster using an array of Int64’s,

e3c6 · November 26, 2021, 10:03pm

Sorry to revive an old topic. But has any of this changed in more recent versions of Julia?
Do we expect BitArray to be slow and should use them only if we are memory constrained?

e3c6 · January 4, 2022, 12:14pm

This looks related. Why is the conversion from BitArray so much slower than from Array{Bool}?

julia> @benchmark convert(Array{Float32}, b) setup=(b=Array(bitrand(28,28,128));)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  30.867 μs … 591.739 μs  ┊ GC (min … max): 0.00% … 74.02%
 Time  (median):     35.097 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   38.258 μs ±  31.908 μs  ┊ GC (mean ± σ):  5.89% ±  6.67%

        ▄██▇▅▄▅▄▂▁                                              
  ▂▁▂▃▃▇██████████▇▅▄▃▃▃▃▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  30.9 μs         Histogram: frequency by time         53.9 μs <

 Memory estimate: 392.06 KiB, allocs estimate: 2.

julia> @benchmark convert(Array{Float32}, b) setup=(b=bitrand(28,28,128);)
BenchmarkTools.Trial: 8586 samples with 1 evaluation.
 Range (min … max):  515.307 μs …  1.291 ms  ┊ GC (min … max): 0.00% … 40.74%
 Time  (median):     556.879 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   576.900 μs ± 66.623 μs  ┊ GC (mean ± σ):  0.63% ±  3.89%

  ▅▄ ▆█▄▂▆▅▃▁▅▃▂ ▄▃▂ ▂▄▂▁ ▁▄▂▂▁                                ▂
  ██▇███████████▇███▇████▇█████▇█▇▇▇▇▇▆▆▆▇▅▅▅▆▇▇▄▄▄▆▆▆▅▅▃▅▄▄▄▃ █
  515 μs        Histogram: log(frequency) by time       850 μs <

 Memory estimate: 392.06 KiB, allocs estimate: 2.

pjentsch0 · January 6, 2022, 7:26pm

BitArrays pack bits into Int arrays AFAIK. I suspect that the performance hit is from unpacking.

JeffreySarnoff · January 7, 2022, 3:53am

Basically, if the size of your vector/array of boolean values is not an issue in and of itself, use Vector{Bool} where each byte holds one boolean value. If the size is an issue and 1/8th the size is preferable, use BitVector where each byte holds eight boolean values and accept the time overhead.

DNF · January 7, 2022, 6:24am

BitArrays are extremely fast for operations where unpacking is unnecessary, such as count/sum, .!, etc. It’s not just about saving memory.

jzr · January 7, 2022, 6:35am

Saving space is also closely related to speed because the less space your data needs, the fewer cache lines are needed to store it, and the fewer fetches from main memory are needed.

mbauman · January 7, 2022, 3:38pm

In this particular case, the bitarray algorithm isn’t simd’ing. You can get it to by throwing an @simd ivdep at it — and then the performance is more similar — but I’m not 100% certain that’s safe (see above how we had to remove this in some places specifically because of BitArrays). There’s probably a better way of writing it than this naive loop:

https://github.com/JuliaLang/julia/blob/4f1ff0bb0055806fcce880765c0be26313f936bf/base/bitarray.jl#L497-L499

e3c6 · January 12, 2022, 12:45pm

The problem is that broadcasting ops like:

julia> randn(10) .< 0
10-element BitVector:
 1
 1
 1
 1
 1
 0
 0
 1
 0
 1

return a BitArray by default, not Array{Bool}.
So you are forced to do a conversion afterwards. This hurts if the conversion is slow.

DNF · January 12, 2022, 12:52pm

But, as mentioned, BitArray is not, in general, slow. In many cases it is blazing fast:

x = rand(1000)
ind = randn(1000) .< 0.5
indbool = Vector{Bool}(ind)

1.7.0> @btime $x[$ind];
  788.060 ns (1 allocation: 5.62 KiB)

1.7.0> @btime $x[$indbool];
  1.290 μs (1 allocation: 5.62 KiB)

1.7.0> @btime count($ind);
  6.000 ns (0 allocations: 0 bytes)

1.7.0> @btime count($indbool);
  22.088 ns (0 allocations: 0 bytes)

1.7.0> @btime .!($ind);
  42.020 ns (2 allocations: 224 bytes)

1.7.0> @btime .!($indbool);
  548.958 ns (3 allocations: 4.41 KiB)

But you need to avoid accessing individual elements.

e3c6 · January 13, 2022, 8:46am

Yep. But matrix multiplies are slow, so you have to convert the BitArray into something BLAS likes.

Topic		Replies	Views
Efficient way to get indices of true elements from BitArray New to Julia indexing	5	1712	February 3, 2022
Array performance Julia 0.6 vs 0.5 Performance	11	1766	March 19, 2018
Xor performance Performance question	3	2460	October 18, 2019
.= vs = speed difference New to Julia	2	544	June 12, 2019
How to create a random binary array with b bits? New to Julia	15	4244	January 29, 2020

Why is BitArray so slow?

Related topics