Louisiana State University is uploading lectures on “Parallel C++ for Scientific Applications” to YouTube. Among other things, there is a lecture on how to improve the performance of matrix multiplication by taking advantage of hardware features such as caching and memory access optimization.
1 Like