Your matrix is obviously not that big; it would be more helpful to give actual runnable examples. Also, you should quote your code. Please read: make it easier to help you
Probably faster just to sort the columns directly, in-place, instead of computing the permutation indices and then applying them. e.g.
I don’t know how big the matrix will be, but currently I’m using ~1TB of memory. I can’t tell you the exact dimensions: (1) There could be an appetite to increase the size twentyfive- or hundred-folds. (b) I don’t know the data type exactly. But if the exact number of zeros determines the solution you would recommend, you could say it is A = rand(10000000,10000000) to get a physical example.
I had to calculate the sortperm for some other routine, so I thought I could reuse it to save time.
No, you couldn’t — that would require 800 TB of memory. To get useful advice it would be better to give realistic information about your application.
Do you actually have that much RAM? Or are you working “out-of-core” with data stored in files or swap memory? (Big difference.)
I had to calculate the sortparm for some other routine, so I thought I could reuse it to save time.
If your dataset is truly enormous, you really need to be careful about how many passes over the array you make and how much extraneous data you allocate. Your thinking needs to fundamentally change when your data gets this big.
If possible you should avoid allocating an entire array (terabytes?) of all your column permutations at once, and then separately make a pass over the array or other data in order to use those permutations. For example, you could loop over your data 1 column at a time and do as much computation as possible before moving onto the next column: compute the permutation and sort the column, do whatever you need with the permutation indices, and then discard those indices when you move onto the next column (so that you make a single pass over the data, and don’t store the permuations for all columns simultaneously).
But it’s hard to give you more useful advice without knowing more context about what you are doing.
To make this transformation in-place, the following can be used:
for (v, p) in zip(eachcol(A), eachcol(idx))
permute!(v, p)
end
Afterwhich, A will have columns sorted.
This can also be written succintly as:
foreach(permute!, eachcol(A), eachcol(idx))
As a sidenote, please spend some time to make the question easy-to-answer. Any work related issue preventing the disclosure of actual application, can be easily be overcome by making a toy example, with a little more effort. Perhaps the toy-example in this answer is what was needed.
If @kingoslo actually has 1TB of data (and wants to go 25–100× larger!) then a small toy example will likely be totally misleading for algorithm choices.
For such large problems you need to put a little more thought into describing the actual problem and some context, and can’t expect much more than general advice (since no one else will casually run or write code for problems that large just to help you).