Algorithmic complexity of matrix multiplication in Julia

This is a slightly odd question, but I am working on a project for which I would like to know the algorithmic complexity of matrix multiplication in Julia. i.e. if A, B are two dimensional n x n arrays stored in the usual way (no sparse arrays etc.), what is the worse case for the complexity of the operation A*B in Julia?

Thanks in advance for your responses.

1 Like

I think this depends on the specific implementation of BLAS in use. I guess they mostly use something like this:

1 Like

That method (and the others <3) are not competitive at normal matrix sizes.

7 Likes

No, virtually no optimized BLAS uses Strassen’s algorithm in practice to my knowledge. The serious performance studies I’ve seen (which use an optimized BLAS for the base case) show that the crossover m for one step of m\times m Strassen to be marginally better is a few thousand, but for it to be clearly superior you need to be > 10^4, at which point most applications are starting to switch over to sparse algorithms or other algorithms that exploit special structure. Moreover, once you throw in multithreading and the fact the underlying BLAS codes are being continually tuned for new hardware, no one seems to want to put in the effort to keeping a highly tuned Strassen up to date. Finally, Strassen requires large temporary buffers, which are problematic because the huge matrices where it would be theoretically useful are often near the limits of available memory.

The upshot is that nearly all practical dense matrix–matrix algorithm implementations have \Theta(m^3) arithmetic complexity.

9 Likes

That being said, since optimizations and parallelism can be easier in a high level language, and memory is getting more plentiful, so it might be nice to see a Strassen package to perform fresh experiments with, perhaps built on top of Octavian.jl.

For example, you could implement a special matrix type with storage optimized for Strassen-type algorithms.

5 Likes

Julia uses a conservative choice of openBLAS / MKL for linear algebra. These don’t use fast (ie subcubic) matrix multiplication, and afaiu there are no plans to change that in the near future (neither on julia nor on openBLAS/MKL side).

There are alternative competitive libraries like https://github.com/flame/blis (doesn’t use fast matmul either, but afaiu serves as the base for most attempts to make strassen useful in real life), and there is active research to use fast matrix multiplication in real life.

Consider e.g. https://apps.cs.utexas.edu/apps/sites/default/files/tech_reports/GPUStrassen.pdf or https://arxiv.org/pdf/1605.01078.pdf : both papers report cross-overs on the order of 2k-10k square matrices.

So I guess it’s not too unrealistic to see BLIS as a third BLAS provider in julia that by then actually uses Strassen, in maybe 3-5 years.

2 Likes

http://jianyuhuang.com/papers/sc16.pdf actually shows that you can implement Strassen using only the packing vectors already used in OpenBlas, and the performance crossover can be as low as 512x512 (at least for single threaded). Getting a good implementation in Julia would be really cool.

5 Likes

Be careful what you whish for :grin: : GitHub - JuliaLinearAlgebra/BLIS.jl: This repo plans to provide a low-level Julia wrapper for BLIS typed interface.

Specifically, Update to use libblastrampoline · Issue #3 · JuliaLinearAlgebra/BLIS.jl · GitHub should make this a lot more straightforward in Julia 1.7.

4 Likes

Are there any Strassen GPU results? That might be where the value is, if there’s enough memory.

2 Likes
2 Likes

Thank you everyone for the responses. I did not expect the community to be this helpful.

6 Likes