 # Why mul! is so fast with BitVector?

Consider the following example code:

``````using LinearAlgebra, BenchmarkTools

n = 200
xb = BitVector(rand(Bool, n))
xf = Float32.(xb)
W = randn(Float32,n,n)
y = zeros(Float32,n);

function mymul!(y, A, x)
y .= 0
@inbounds for j = 1:length(x)
@simd for i = 1:length(y)
y[i] += A[i,j] * x[j]
end
end
end
``````

Here I find the following benchmarks:

``````@btime mymul!(\$y, \$W, \$xf);   # 4.542 μs (0 allocations: 0 bytes)
@btime mul!(\$y, \$W, \$xf);     # 7.541 μs (0 allocations: 0 bytes)
@btime mymul!(\$y, \$W, \$xb);   # 15.586 μs (0 allocations: 0 bytes)  (why?)
@btime mul!(\$y, \$W, \$xb);     # 7.864 μs (0 allocations: 0 bytes)
``````

So why is `mymul!` so slow when it is fed with a BitVector, as compared to `mul!` from LinearAlgebra?

Note that this question is very similar to Why mul! is so fast?. But in that thread I preferred to focus on Float64 arguments to simplify. Here I am referring to a difference specifically for BitVector arguments. I expect that the reason here will be different that in that thread because of the way the bools are stored as bits in a BitVector. Also note that I didn’t find such a big difference here when using Float64 instead of Float32, and I’m not sure how relevant that is.

It’s probably the conversion from `BitVector` to `Float32`. Here’s another way to do it.

``````function mymul!(y, A, x::BitVector)
y .= 0
for j in 1:length(x)
@inbounds x[j] && (y .+= @view A[:,j])
end
end
``````

Quite a bit faster.

1 Like