Contiguous Read Non-Contiguous Write vs Non-Contiguous Read Contigous Write Performance

stevengj · May 2, 2025, 1:57am

For an out-of-place transpose like this, to get good cache-line utilization you want to do neither order: you generally want to “tile” the loops, either by tuning to your cache or by using a cache-oblivious algorithm.

(Optimizing transposition is a heavily studied problem, with a fair amount of literature and code out there if you search.)

See also e.g. Function on matrix transpose and performance - #4 by stevengj and the links in that thread.

Topic		Replies	Views
Performance difference in permuting Arrays Performance sort , sortperm	4	535	October 11, 2021
Performance differences of contiguous vs non-contiguous column indexing? Performance question	2	868	January 12, 2021
Performance optimization：Frequently use permutedims function Performance question	22	366	May 1, 2025
Non-intuitive perf diff between `matrix * vector`, `matrix' * vector` and `copy(matrix') * vector` Performance blas	2	691	September 27, 2019
Speeding up operations on large arrays Performance	6	518	December 21, 2022

Contiguous Read Non-Contiguous Write vs Non-Contiguous Read Contigous Write Performance

Related topics