Sum over view of BitArray

q=trues(10000)
d=view(q,1:10000)

@btime sum($q)
17.034 ns (0 allocations: 0 bytes)
10000

@btime sum($d)
2.678 μs (0 allocations: 0 bytes)
10000

how can i improve performance of sum over continuous subset of a BitArrary?

2 Likes

I think the issue here is that BitArray is designed for elements to be accessed in bulk, not individually. This allows for a very efficient sum.
When you take a view of it, you create a wrapper that implements individual getindex, but more methods would probably be needed to make reductions like sum fast.

4 Likes

Specifically, BitArray internally contains a Vector{UInt64} (157 64-bit chunks for 10000 bits), and its sum dispatches to an internal Base.bitcount that works on these 157 chunks.

sum(a::AbstractArray{Bool}; kw...) =
    isempty(kw) ? count(a) : reduce(add_sum, a; kw...)
...
_count(::typeof(identity), B::BitArray, ::Colon, init) = bitcount(B.chunks; init)

function bitcount(Bc::Vector{UInt64}; init::T=0) where {T}
    n::T = init
    @inbounds for i = 1:length(Bc)
        n = (n + count_ones(Bc[i])) % T
    end
    return n
end

Wrappers of BitArray, such as views, go over 10000 elements, accessing each chunk 64 times. Since the SubArray type made by view is not a continuous subset in general, e.g. view(q, 1:7:10000), it can’t use the same chunk strategy. I wonder if UnitRange views of dense vectors are always dense and could be separate dense types.

1 Like

That’s already a thing:

But for a subarray of BitArray that’s not enough to ensure you’re reading whole bytes, since also the first element and the span are relevant.