Consider the following example code:
using LinearAlgebra, BenchmarkTools
n = 200
xb = BitVector(rand(Bool, n))
xf = Float32.(xb)
W = randn(Float32,n,n)
y = zeros(Float32,n);
function mymul!(y, A, x)
y .= 0
@inbounds for j = 1:length(x)
@simd for i = 1:length(y)
#y[i] = muladd(A[i,j], x[j], y[i])
y[i] += A[i,j] * x[j]
end
end
end
Here I find the following benchmarks:
@btime mymul!($y, $W, $xf); # 4.542 μs (0 allocations: 0 bytes)
@btime mul!($y, $W, $xf); # 7.541 μs (0 allocations: 0 bytes)
@btime mymul!($y, $W, $xb); # 15.586 μs (0 allocations: 0 bytes) (why?)
@btime mul!($y, $W, $xb); # 7.864 μs (0 allocations: 0 bytes)
So why is mymul!
so slow when it is fed with a BitVector, as compared to mul!
from LinearAlgebra?
Note that this question is very similar to Why mul! is so fast?. But in that thread I preferred to focus on Float64 arguments to simplify. Here I am referring to a difference specifically for BitVector arguments. I expect that the reason here will be different that in that thread because of the way the bools are stored as bits in a BitVector. Also note that I didn’t find such a big difference here when using Float64 instead of Float32, and I’m not sure how relevant that is.