You might want to check out [ANN]: PaddedMatrices.jl, Julia BLAS and partially sized arrays for some truly impressive benchmark results + discussion of PaddedMatrices.jl
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| LoopVec, Tullio losing to Matrix multiplication | 9 | 834 | July 25, 2024 | |
| Julia matrix-multiplication performance | 20 | 9133 | October 30, 2022 | |
| Tullio seems two times slower than basic LoopVectorization | 3 | 1113 | April 10, 2022 | |
| Speed comparison matrix multiplication in Julia | 45 | 3594 | August 19, 2021 | |
| OpenBLAS is faster than Intel MKL on AMD Hardware (Ryzen) | 40 | 37205 | June 19, 2020 |