Your matrix is obviously not that big; it would be more helpful to give actual runnable examples. Also, you should quote your code. Please read: make it easier to help you
Probably faster just to sort the columns directly, in-place, instead of computing the permutation indices and then applying them. e.g.
for col in eachcol(A)
sort!(col)
end
or foreach(sort!, eachcol(A))
.