Sum over view of BitArray

jerry_ji · June 3, 2025, 3:34am

q=trues(10000)
d=view(q,1:10000)

@btime sum($q)
17.034 ns (0 allocations: 0 bytes)
10000

@btime sum($d)
2.678 μs (0 allocations: 0 bytes)
10000

how can i improve performance of sum over continuous subset of a BitArrary?

gdalle · June 3, 2025, 4:51am

I think the issue here is that BitArray is designed for elements to be accessed in bulk, not individually. This allows for a very efficient sum.
When you take a view of it, you create a wrapper that implements individual getindex, but more methods would probably be needed to make reductions like sum fast.

Benny · June 3, 2025, 5:30am

Specifically, BitArray internally contains a Vector{UInt64} (157 64-bit chunks for 10000 bits), and its sum dispatches to an internal Base.bitcount that works on these 157 chunks.

sum(a::AbstractArray{Bool}; kw...) =
    isempty(kw) ? count(a) : reduce(add_sum, a; kw...)
...
_count(::typeof(identity), B::BitArray, ::Colon, init) = bitcount(B.chunks; init)

function bitcount(Bc::Vector{UInt64}; init::T=0) where {T}
    n::T = init
    @inbounds for i = 1:length(Bc)
        n = (n + count_ones(Bc[i])) % T
    end
    return n
end

Wrappers of BitArray, such as views, go over 10000 elements, accessing each chunk 64 times. Since the SubArray type made by view is not a continuous subset in general, e.g. view(q, 1:7:10000), it can’t use the same chunk strategy. I wonder if UnitRange views of dense vectors are always dense and could be separate dense types.

giordano · June 3, 2025, 8:39am

That’s already a thing:

github.com/JuliaLang/julia

base/subarray.jl

9108dd08a


      
          # But SubArrays with fast linear indexing pre-compute a stride and offset
          FastSubArray{T,N,P,I} = SubArray{T,N,P,I,true}

github.com/JuliaLang/julia

base/subarray.jl

9108dd08a


      
          # We can avoid a multiplication if the first parent index is a Colon or AbstractUnitRange,
          # or if all the indices are scalars, i.e. the view is for a single value only
          FastContiguousSubArray{T,N,P,I<:Union{Tuple{AbstractUnitRange, Vararg{Any}},
                                                Tuple{Vararg{ScalarIndex}}}} = SubArray{T,N,P,I,true}

But for a subarray of BitArray that’s not enough to ensure you’re reading whole bytes, since also the first element and the span are relevant.

Topic		Replies	Views
Why is BitArray so slow? Performance	28	5838	January 14, 2022
Should view(array, 1:length(array)) return a SubArray? Data array	11	1145	August 19, 2017
View or copy: looking for an overview Performance views	8	577	June 9, 2022
Faster alternate to @views for passing subarrays to functions Performance array , views	3	683	January 7, 2022
Why operation on return of @views take a long time Performance	7	186	September 6, 2024

Sum over view of BitArray

Related topics