This is a slightly odd question, but I am working on a project for which I would like to know the algorithmic complexity of matrix multiplication in Julia. i.e. if A, B are two dimensional n x n arrays stored in the usual way (no sparse arrays etc.), what is the worse case for the complexity of the operation A*B in Julia?
No, virtually no optimized BLAS uses Strassen’s algorithm in practice to my knowledge. The serious performance studies I’ve seen (which use an optimized BLAS for the base case) show that the crossover m for one step of m\times m Strassen to be marginally better is a few thousand, but for it to be clearly superior you need to be > 10^4, at which point most applications are starting to switch over to sparse algorithms or other algorithms that exploit special structure. Moreover, once you throw in multithreading and the fact the underlying BLAS codes are being continually tuned for new hardware, no one seems to want to put in the effort to keeping a highly tuned Strassen up to date. Finally, Strassen requires large temporary buffers, which are problematic because the huge matrices where it would be theoretically useful are often near the limits of available memory.
The upshot is that nearly all practical dense matrix–matrix algorithm implementations have \Theta(m^3) arithmetic complexity.
That being said, since optimizations and parallelism can be easier in a high level language, and memory is getting more plentiful, so it might be nice to see a Strassen package to perform fresh experiments with, perhaps built on top of Octavian.jl.
For example, you could implement a special matrix type with storage optimized for Strassen-type algorithms.
Julia uses a conservative choice of openBLAS / MKL for linear algebra. These don’t use fast (ie subcubic) matrix multiplication, and afaiu there are no plans to change that in the near future (neither on julia nor on openBLAS/MKL side).
http://jianyuhuang.com/papers/sc16.pdf actually shows that you can implement Strassen using only the packing vectors already used in OpenBlas, and the performance crossover can be as low as 512x512 (at least for single threaded). Getting a good implementation in Julia would be really cool.