to compute the permutation vector. I managed to make the code allocation-free in this way,
but I wonder if there is a faster code out there. I did find SortingLab.jl , and SortingAlgorithms. Neither of these would work without major changes, unfortunately. Either the code hasn’t been touched in years and doesn’t run anymore, or the permutation vector is not provided.

Yes, I need to sort two pieces of information (two vectors, an integer vector, and a floating point vector). They go in tandem, and hence I would like to compute a permutation (in-place).

I need to sort millions of such vectors, therefore I need to avoid allocations.

sortperm!(ix, v; alg::Algorithm=DEFAULT_UNSTABLE, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward, initialized::Bool=false)
Like sortperm, but accepts a preallocated index vector ix.
If initialized is false (the default), ix is initialized to
contain the values 1:length(v).
Examples
≡≡≡≡≡≡≡≡≡≡
julia> v = [3, 1, 2]; p = zeros(Int, 3);
julia> sortperm!(p, v); p
3-element Vector{Int64}:
2
3
1

sortperm! allocates for me too (annoying constant 16 bytes), but it has been changed in more recent versions, so maybe we are both not up to date. Latest version allows passing a scratch space, which would specifically allow non-allocating even with algos needing a little memory.

So, if you manage to test newer version, I’ll be glad to hear sortperm! is the way to go, since semantically it should be.

1.9.0-rc1 does not allocate in sortperm!, as advertised. Nevertheless, there is a problem somewhere else.

1.9.0-rc1 (with sortperm!, which does not allocate):

0.757041 seconds (328.52 k allocations: 498.412 MiB, 14.44% gc time)

1.8.5 ( with sortperm!, which allocates, and is therefore replaced with sort!(prm, Base.Sort.DEFAULT_UNSTABLE, Base.Order.Perm(Base.Order.Forward, rows))):

0.465655 seconds (15 allocations: 321.406 MiB)

I cannot use --track-allocation as those allocations with 1.9.0-rc1 apparently happen somewhere in the base libraries.

The sort! benchmarks don’t set p to 1:length(v) which is a problem. This accounts partially for the speedier times (in this test, the previous sortperm! fixes this, so results stay okay.

The big allocation in 1.9 sortperm! looks like a bug (maybe since fixed). Possibly, setting p to 1:length(v) by allocating a new vector.

alot of the methods in SortingLab.jl is memory-layout dependent so things like subarrays may not work well. and also yes, it allocates. I found that allocating makes the algo run faster.

But if you want to modify it it should easy to do sine the algo is quite simple.

using BenchmarkTools
const v = rand(100_000);
const p = zeros(Int, 100_000);
@btime sortperm!($p, $v);
# @show p
@btime begin $p .= 1:length($v); sort!($p, Base.Sort.DEFAULT_UNSTABLE, Base.Order.Perm(Base.Order.Forward, $v)); end
# @show p

with 1.9.0-rc1

4.945 ms (2 allocations: 781.30 KiB)
4.924 ms (2 allocations: 781.30 KiB)

and with 1.8.5

6.796 ms (1 allocation: 16 bytes)
6.796 ms (0 allocations: 0 bytes)