Multiplying boolean matrices

Why is there such a performance difference between multiplying float matrices and other kinds of matrices? I understand float matrices hit BLAS which is heavily optimized. But is this a fundamental difference or or just that we are lacking a “BLAS for boolean matrices” for example?

image

Can we expect that in the future with a generic Julia implementation of BLAS this difference will disappear?

Is there a package one can use now to speedup boolean matrices?

1 Like

As suggested by @antoine-levitt, it is interesting to look at the non-BLAS float matmul currently implemented in Julia.

image

Interesting this is also quite faster than the boolean matrix matmul. Is there some intrinsic difficulty in multiplying against boolean matrices?

Notice that a fast boolean matrix matmul could be useful in deep learning for one-hot represented data.
If the one-hot data could be represented as BitArray the memory usage would be quite smaller.

Hm, generic_matmatmul still does something fancy (see _generic_matmatmul! in LinearAlgebra/src/matmul.jl) so that’s not quite a plain comparison. You’d have to code your own three-loops algorithm to compare. If you want fast bool matmuls, maybe take a look at GitHub - JuliaLinearAlgebra/Octavian.jl: Multi-threaded BLAS-like library that provides pure Julia matrix multiplication

Using Octavian things are faster:

image

But float matmul still wins significantly.

Unfortunately Octavian doesn’t seem to work with BitMatrix, https://github.com/JuliaLinearAlgebra/Octavian.jl/issues/123.

1 Like

Are you interested in a Matmul with an addition where 1+1 = 1 or 1+1=0?
The performance difference comes from indexing into a BitArray which involves slicing to get Int with the correct value. It could be implement a lot faster by bit level boolean operation by and-ing the two vectors and then either checking whether it is non-zero (when 1+1=1) or checking the last bit of the intrinsic popcount (when you want 1+1=0).

1 Like