Why is there such a performance difference between multiplying float matrices and other kinds of matrices? I understand float matrices hit BLAS which is heavily optimized. But is this a fundamental difference or or just that we are lacking a “BLAS for boolean matrices” for example?
Can we expect that in the future with a generic Julia implementation of BLAS this difference will disappear?
Is there a package one can use now to speedup boolean matrices?
As suggested by @antoine-levitt, it is interesting to look at the non-BLAS float matmul currently implemented in Julia.
Interesting this is also quite faster than the boolean matrix matmul. Is there some intrinsic difficulty in multiplying against boolean matrices?
Notice that a fast boolean matrix matmul could be useful in deep learning for one-hot represented data.
If the one-hot data could be represented as BitArray the memory usage would be quite smaller.
Hm, generic_matmatmul still does something fancy (see _generic_matmatmul! in LinearAlgebra/src/matmul.jl) so that’s not quite a plain comparison. You’d have to code your own three-loops algorithm to compare. If you want fast bool matmuls, maybe take a look at GitHub - JuliaLinearAlgebra/Octavian.jl: Multi-threaded BLAS-like library that provides pure Julia matrix multiplication
Using Octavian things are faster:
But float matmul still wins significantly.
Unfortunately Octavian doesn’t seem to work with
Are you interested in a Matmul with an addition where 1+1 = 1 or 1+1=0?
The performance difference comes from indexing into a BitArray which involves slicing to get Int with the correct value. It could be implement a lot faster by bit level boolean operation by and-ing the two vectors and then either checking whether it is non-zero (when 1+1=1) or checking the last bit of the intrinsic popcount (when you want 1+1=0).