Consider the following example code:
using LinearAlgebra, BenchmarkTools n = 200 xb = BitVector(rand(Bool, n)) xf = Float32.(xb) W = randn(Float32,n,n) y = zeros(Float32,n); function mymul!(y, A, x) y .= 0 @inbounds for j = 1:length(x) @simd for i = 1:length(y) #y[i] = muladd(A[i,j], x[j], y[i]) y[i] += A[i,j] * x[j] end end end
Here I find the following benchmarks:
@btime mymul!($y, $W, $xf); # 4.542 μs (0 allocations: 0 bytes) @btime mul!($y, $W, $xf); # 7.541 μs (0 allocations: 0 bytes) @btime mymul!($y, $W, $xb); # 15.586 μs (0 allocations: 0 bytes) (why?) @btime mul!($y, $W, $xb); # 7.864 μs (0 allocations: 0 bytes)
So why is
mymul! so slow when it is fed with a BitVector, as compared to
mul! from LinearAlgebra?
Note that this question is very similar to Why mul! is so fast?. But in that thread I preferred to focus on Float64 arguments to simplify. Here I am referring to a difference specifically for BitVector arguments. I expect that the reason here will be different that in that thread because of the way the bools are stored as bits in a BitVector. Also note that I didn’t find such a big difference here when using Float64 instead of Float32, and I’m not sure how relevant that is.