Algorithmic complexity of matrix multiplication in Julia

No, virtually no optimized BLAS uses Strassen’s algorithm in practice to my knowledge. The serious performance studies I’ve seen (which use an optimized BLAS for the base case) show that the crossover m for one step of m\times m Strassen to be marginally better is a few thousand, but for it to be clearly superior you need to be > 10^4, at which point most applications are starting to switch over to sparse algorithms or other algorithms that exploit special structure. Moreover, once you throw in multithreading and the fact the underlying BLAS codes are being continually tuned for new hardware, no one seems to want to put in the effort to keeping a highly tuned Strassen up to date. Finally, Strassen requires large temporary buffers, which are problematic because the huge matrices where it would be theoretically useful are often near the limits of available memory.

The upshot is that nearly all practical dense matrix–matrix algorithm implementations have \Theta(m^3) arithmetic complexity.

9 Likes