Multiplication after transpose much faster than multiplication after PermutedDimsArray

Hi,

Thanks for the replies. I understand it has to do with OpenBLAS implementation.
As a related side question, is there a plan to develop a way to dispatch on array storage types in the future?