I want to multiply three matrices together as efficiently as possible.
We all know that if you have tall and skinny or fat and short matrices, the order of operations when multiplying three matrices matter a lot for efficiency.
Example:
n1=5000; n2=10; n3=5000; n4=10
A=randn(n1,n2); B=randn(n2,n3); C=randn(n3,n4)
@btime A*(B*C);
218.357 μs (3 allocations: 391.58 KiB)
@btime (A*B)*C;
105.064 ms (4 allocations: 191.12 MiB)
Is there a way in julia to automatically select the best multiplication order?
I think I know how to create a flop-count based decision. Ideally it would also take the practical performance on the specific computer it is running into account.