LSU lecture on techniques for improving the performance of matrix multiplication in C++

Louisiana State University is uploading lectures on “Parallel C++ for Scientific Applications” to YouTube. Among other things, there is a lecture on how to improve the performance of matrix multiplication by taking advantage of hardware features such as caching and memory access optimization.

3 Likes

This is a pretty common topic in many courses in software performance engineering — the methods to optimize matrix multiplication are well known, the problem is simple enough to be accessible to students, and the results are surprising to novices because the speedups are so large.

See also the discussion in this thread of some of the techniques: Julia matrix-multiplication performance

5 Likes

As an aside, I didn’t understand how CPU caches worked when I wrote Octavian (considering set associativity is important!). Looking at the benchmarks there shows the perf degrades.
Someone could improve that library or start fresh and do much better at large sizes.

Might be a fun GSOC or side project for someone interested in the topic of high perf matmul.

1 Like