In https://github.com/JuliaLang/julia/pull/17623#issuecomment-268703295, @stevengj had asked how common operations on large bit arrays were. For database queries, they are very common (it is a common technique to use bitmap and/or bitslice indices, esp. for decision support sorts of applications), where bit vectors representing millions of rows are anded, ored, or negated (we actually use bit vectors representing up to 4 billion rows for our product).
I was concerned about the comment by @carlobaldassi:
This has removed a few optimizations for BitArrays. One in particular is the case A .* B when A and B have the same shape, which previously was specialized and called A & B. The difference is quite significant, e.g. for 1000x1000 BitArrays it’s almost 40-fold.
For these sorts of applications, it would be quite useful also for the SIMD instructions to be used (previously, I asm. optimized these to use the AVX instructions on x86), and to make that work best, Julia would need to give 32 or 64 alignment to the chunk array in the BitArray.