Function on matrix transpose and performance

stevengj · January 25, 2019, 3:20am

It should be even faster if you call copy(transpose(X)) — Julia has an optimized copy(A) routine for transposed arrays that has good cache-line utilization. Basically, this is the problem of optimized out-of-place transposition, and has been extensively studied; the optimum is some kind of blocked or recursive cache-oblivious algorithm, and a cache-oblivious transpose routine is implemented in Julia (from #6690, I believe).

It doesn’t look like collect calls this optimized routine, but it probably should.

Did it get faster or slower in 1.2? If maximum(X) got much faster, the question is why and whether that can be replicated for other memory layouts. If, on the other hand, it got much slower in 1.2, then you should certainly file a performance issue.

Topic		Replies	Views
Slow multiplication of transpose of sparse matrix Performance	5	940	April 27, 2022
Is there any way to optimize array additions and multiplications with transposes? Numerics tullio , tensors	27	1543	August 2, 2022
Optimal column to row major conversion Performance question , arrays , row-major	7	911	June 7, 2023
Multiplication after transpose much faster than multiplication after PermutedDimsArray Performance linearalgebra	3	1638	April 10, 2019
How to improve the performance while do matrix operation Performance	13	818	August 27, 2018

Function on matrix transpose and performance

Related topics