# Why is BitArray so slow?

I was playing around with flipping numbers of an array, and I was surprised that doing so with BitArrays is much slower. I think the code below is pretty clear,

``````using BenchmarkTools
using Random

a = rand(-1:2:1, 26, 11)
c = bitrand(26, 11)
b = 2 * c .- 1

function loop(a, b)
for i in eachindex(a)
if b[i] == 1
a[i] *= -1
end
end
end

function loop2(a, c)
for i in eachindex(a)
if c[i] == true
a[i] *= -1
end
end
end

loop(a, b)
loop2(a, c)

@btime loop(a, b);
@btime loop2(a, c);

``````

loop(a,b) gives me about 55 ns, and loop2(a,c) gives about 200 ns.

Why would this happen? And what is the absolute fastest way to do this loop?

I guess this comes down to a speed difference with using Int64βs vs any other kind of Int, including bits.

Unless you are memory bound (which usually isnβt the case for non-vectorized loops), bitarray will almost always be slower. You can do things better if you operate directly on the underlying array.

AFAICT the bitarray type doesnβt provide a public generic interface to operate on the underlying array. Fortunately, there arenβt maany operations that can benefit from this since there are only two values each element can take. The particular thing you want to do should be doable with `map!(!, a, a)`.

2 Likes

I cannot reproduce. That being said, CMOV is your friend:

``````julia> N=10_000; a = rand(-1:2:1, N); c=bitrand(N); b = 2 * c .- 1; d=collect(c);
julia> function loop3(a,c)
@simd for i in eachindex(a)
@inbounds a[i]=ifelse(c[i], -a[i], a[i])
end
nothing
end
julia> @btime loop(a,b);
56.643 ΞΌs (0 allocations: 0 bytes)
julia> @btime loop2(a,c);
61.410 ΞΌs (0 allocations: 0 bytes)
julia> @btime loop3(a,c);
13.170 ΞΌs (0 allocations: 0 bytes)
julia> @btime loop3(a,d);
4.960 ΞΌs (0 allocations: 0 bytes)
``````

FWIW, `@simd` is illegal with `BitArray`.

1 Like

Actually, after https://github.com/JuliaLang/julia/pull/27670, `@simd` should be fine, `@simd ivdep` isnβt.

Parallelization is really only bad if youβre assigning into BitArrays, and even then weβve changed the semantics of `@simd` such that it doesnβt perform the bad transform we have run into in the past.

BitArrays are a huge win when you can exploit their Int64 storage and not need to worry about picking out individual bits. Broadcasting now does this with the basic logic operators when possible. Or it can also be a slight win if the size difference is enough to keep you in cache. In any case, itβs an easy place to benchmark and profile between the two implementations if itβs particularly important.

3 Likes

Theyβre also a huge win if you can fit the (parts of the) array (you need) entirely into cache, where you might not be able to otherwise. Weβve seen in LightGraphs that the improvements in performance due to keeping bitarrays in cache outweigh the performance improvements for a vector of booleans for very large graphs where weβre keeping track of whether a vertex has been visited.

2 Likes

Yet another weird performance of an alternative function. The following `loop3(a,b)` doesnβt use conditions at all, but relies on accessing all the indices of `a` and `b` in memory order. For small `a` and `b`, this is slower than `loop(a,b)` which uses conditions, but as `a` and `b` become large enough, they have the same performance. Is accessing an array element in memory order more expensive than a conditional?

``````function loop3(a,b)
for i in eachindex(a)
a[i] *= -b[i]
end
a
end
``````

this is great! thanks, but itβs still more than 4x faster using an array of Int64βs,

Sorry to revive an old topic. But has any of this changed in more recent versions of Julia?
Do we expect BitArray to be slow and should use them only if we are memory constrained?

This looks related. Why is the conversion from BitArray so much slower than from Array{Bool}?

``````julia> @benchmark convert(Array{Float32}, b) setup=(b=Array(bitrand(28,28,128));)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max):  30.867 ΞΌs β¦ 591.739 ΞΌs  β GC (min β¦ max): 0.00% β¦ 74.02%
Time  (median):     35.097 ΞΌs               β GC (median):    0.00%
Time  (mean Β± Ο):   38.258 ΞΌs Β±  31.908 ΞΌs  β GC (mean Β± Ο):  5.89% Β±  6.67%

ββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
30.9 ΞΌs         Histogram: frequency by time         53.9 ΞΌs <

Memory estimate: 392.06 KiB, allocs estimate: 2.

julia> @benchmark convert(Array{Float32}, b) setup=(b=bitrand(28,28,128);)
BenchmarkTools.Trial: 8586 samples with 1 evaluation.
Range (min β¦ max):  515.307 ΞΌs β¦  1.291 ms  β GC (min β¦ max): 0.00% β¦ 40.74%
Time  (median):     556.879 ΞΌs              β GC (median):    0.00%
Time  (mean Β± Ο):   576.900 ΞΌs Β± 66.623 ΞΌs  β GC (mean Β± Ο):  0.63% Β±  3.89%

ββ βββββββββββ βββ ββββ βββββ                                β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
515 ΞΌs        Histogram: log(frequency) by time       850 ΞΌs <

Memory estimate: 392.06 KiB, allocs estimate: 2.

``````

BitArrays pack bits into `Int` arrays AFAIK. I suspect that the performance hit is from unpacking.

Basically, if the size of your vector/array of boolean values is not an issue in and of itself, use `Vector{Bool}` where each byte holds one boolean value. If the size is an issue and 1/8th the size is preferable, use `BitVector` where each byte holds eight boolean values and accept the time overhead.

`BitArray`s are extremely fast for operations where unpacking is unnecessary, such as `count`/`sum`, `.!`, etc. Itβs not just about saving memory.

1 Like

Saving space is also closely related to speed because the less space your data needs, the fewer cache lines are needed to store it, and the fewer fetches from main memory are needed.

1 Like

In this particular case, the bitarray algorithm isnβt simdβing. You can get it to by throwing an `@simd ivdep` at it β and then the performance is more similar β but Iβm not 100% certain thatβs safe (see above how we had to remove this in some places specifically because of BitArrays). Thereβs probably a better way of writing it than this naive loop:

3 Likes

The problem is that broadcasting ops like:

``````julia> randn(10) .< 0
10-element BitVector:
1
1
1
1
1
0
0
1
0
1
``````

return a `BitArray` by default, not `Array{Bool}`.
So you are forced to do a conversion afterwards. This hurts if the conversion is slow.

But, as mentioned, `BitArray` is not, in general, slow. In many cases it is blazing fast:

``````x = rand(1000)
ind = randn(1000) .< 0.5
indbool = Vector{Bool}(ind)

1.7.0> @btime \$x[\$ind];
788.060 ns (1 allocation: 5.62 KiB)

1.7.0> @btime \$x[\$indbool];
1.290 ΞΌs (1 allocation: 5.62 KiB)

1.7.0> @btime count(\$ind);
6.000 ns (0 allocations: 0 bytes)

1.7.0> @btime count(\$indbool);
22.088 ns (0 allocations: 0 bytes)

1.7.0> @btime .!(\$ind);
42.020 ns (2 allocations: 224 bytes)

1.7.0> @btime .!(\$indbool);
548.958 ns (3 allocations: 4.41 KiB)
``````

But you need to avoid accessing individual elements.

3 Likes

Yep. But matrix multiplies are slow, so you have to convert the BitArray into something BLAS likes.