There are many ways to dramatically improve performance by using some algebraic equivalencies, for example :
tr(A*B) == dot(A,B') diag(A*B) == vec(sum(A.*B',dims=2))
Do you know if there are ressources listing this kind of useful equivalencies?
And if not how about starting an organized collaborative document where everyone can share his knowledge?
Edit : corrected formula for diag