Boolean operations on large bit vectors / arrays

In https://github.com/JuliaLang/julia/pull/17623#issuecomment-268703295, @stevengj had asked how common operations on large bit arrays were. For database queries, they are very common (it is a common technique to use bitmap and/or bitslice indices, esp. for decision support sorts of applications), where bit vectors representing millions of rows are anded, ored, or negated (we actually use bit vectors representing up to 4 billion rows for our product).
I was concerned about the comment by @carlobaldassi:

This has removed a few optimizations for BitArrays. One in particular is the case A .* B when A and B have the same shape, which previously was specialized and called A & B. The difference is quite significant, e.g. for 1000x1000 BitArrays it’s almost 40-fold.

For these sorts of applications, it would be quite useful also for the SIMD instructions to be used (previously, I asm. optimized these to use the AVX instructions on x86), and to make that work best, Julia would need to give 32 or 64 alignment to the chunk array in the BitArray.

The discussion here on specialized versions for Vector is highly related at a more abstract level:

I don’t think that that change is too concerning, that comment is about .*, which can be directly replaced with & for that specific case. Therefore, that is only going to affect generic code which calls .*; if you already know you are dealing with bit masks you should probably use & anyway. If you also want to avoid allocations, it’s best to use map!, which also has a specialized version when the operation is & and the arguments are BitArrays. (The same is true for other bitwise operators of course.)

(For what it’s worth, I also use large BitArrays quite frequently, unsurprisingly…)

3 Likes