Is that really the case? I thought batched matmul functions generally had some optimizations, at least by automatically leveraging parallelism on multiple cores or a GPU. A plain for-loop or broadcasting won’t do that for us.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Speeding up matrix exponential and matrix multiplication | 58 | 2088 | August 1, 2023 | |
| Performance of `exp(A)` for 9x9 anti-Hermitian matrix: Julia vs. PyTorch vs. MATLAB (CPU & GPU) | 29 | 1294 | August 28, 2025 | |
| Numpy 10x faster than Julia ?! What am I doing wrong ?! [solved - julia faster now] | 37 | 11230 | October 15, 2019 | |
| Fastest way to calculate eigenvectors of 4x4 matrix | 21 | 987 | July 13, 2025 | |
| Extending broadcasts to matrix exponentials? | 7 | 620 | June 8, 2022 |