q=trues(10000)
d=view(q,1:10000)
@btime sum($q)
17.034 ns (0 allocations: 0 bytes)
10000
@btime sum($d)
2.678 μs (0 allocations: 0 bytes)
10000
how can i improve performance of sum over continuous subset of a BitArrary?
q=trues(10000)
d=view(q,1:10000)
@btime sum($q)
17.034 ns (0 allocations: 0 bytes)
10000
@btime sum($d)
2.678 μs (0 allocations: 0 bytes)
10000
how can i improve performance of sum over continuous subset of a BitArrary?
I think the issue here is that BitArray
is designed for elements to be accessed in bulk, not individually. This allows for a very efficient sum.
When you take a view of it, you create a wrapper that implements individual getindex
, but more methods would probably be needed to make reductions like sum fast.
Specifically, BitArray
internally contains a Vector{UInt64}
(157 64-bit chunks for 10000 bits), and its sum
dispatches to an internal Base.bitcount
that works on these 157 chunks.
sum(a::AbstractArray{Bool}; kw...) =
isempty(kw) ? count(a) : reduce(add_sum, a; kw...)
...
_count(::typeof(identity), B::BitArray, ::Colon, init) = bitcount(B.chunks; init)
function bitcount(Bc::Vector{UInt64}; init::T=0) where {T}
n::T = init
@inbounds for i = 1:length(Bc)
n = (n + count_ones(Bc[i])) % T
end
return n
end
Wrappers of BitArray
, such as views, go over 10000 elements, accessing each chunk 64 times. Since the SubArray
type made by view
is not a continuous subset in general, e.g. view(q, 1:7:10000)
, it can’t use the same chunk strategy. I wonder if UnitRange
views of dense vectors are always dense and could be separate dense types.
That’s already a thing:
But for a subarray of BitArray
that’s not enough to ensure you’re reading whole bytes, since also the first element and the span are relevant.