Fast permutation vector: code needed

Hello all,

I need to sort integer vectors, with entries of arbitrary size relative to the length of the vector.
At the moment I use

sort!(prm, Base.Sort.DEFAULT_UNSTABLE, Base.Order.Perm(Base.Order.Forward, rows))

to compute the permutation vector. I managed to make the code allocation-free in this way,
but I wonder if there is a faster code out there. I did find SortingLab.jl , and SortingAlgorithms. Neither of these would work without major changes, unfortunately. Either the code hasn’t been touched in years and doesn’t run anymore, or the permutation vector is not provided.

Any suggestions would be appreciated.

So just to clarify, you don’t want to sort the vector in place, just to compute the permutation?

Yes, I need to sort two pieces of information (two vectors, an integer vector, and a floating point vector). They go in tandem, and hence I would like to compute a permutation (in-place).

I need to sort millions of such vectors, therefore I need to avoid allocations.

Does sortperm! (with !) fit your needs?

From help:

sortperm!(ix, v; alg::Algorithm=DEFAULT_UNSTABLE, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward, initialized::Bool=false)

Like sortperm, but accepts a preallocated index vector ix.
If initialized is false (the default), ix is initialized to
  contain the values 1:length(v).

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> v = [3, 1, 2]; p = zeros(Int, 3);
  
  julia> sortperm!(p, v); p
  3-element Vector{Int64}:
   2
   3
   1

I actually extracted the code shown in the OP from the sortperm!: that function allocated, while my code fragment does not.

sortperm! allocates for me too (annoying constant 16 bytes), but it has been changed in more recent versions, so maybe we are both not up to date. Latest version allows passing a scratch space, which would specifically allow non-allocating even with algos needing a little memory.

So, if you manage to test newer version, I’ll be glad to hear sortperm! is the way to go, since semantically it should be.

I don’t know, I am on Julia 1.8.5, and sortperm! would allocate.

1.9.0-rc1 does not allocate in sortperm!, as advertised. Nevertheless, there is a problem somewhere else.

1.9.0-rc1 (with sortperm!, which does not allocate):

0.757041 seconds (328.52 k allocations: 498.412 MiB, 14.44% gc time)

1.8.5 ( with sortperm!, which allocates, and is therefore replaced with sort!(prm, Base.Sort.DEFAULT_UNSTABLE, Base.Order.Perm(Base.Order.Forward, rows))):

0.465655 seconds (15 allocations: 321.406 MiB) 

I cannot use --track-allocation as those allocations with 1.9.0-rc1 apparently happen somewhere in the base libraries.

Are you measuring with something like:

julia> const v = rand(10_000);

julia> const p = zeros(Int, 10_000);

julia> using BenchmarkTools

julia> @btime sortperm!($p, $v);
  666.553 μs (1 allocation: 16 bytes)

(use bigger vector if 10_000 small)
?

using BenchmarkTools
const v = rand(100_000);

const p = zeros(Int, 100_000);

@btime sortperm!($p, $v);
@btime sort!($p, Base.Sort.DEFAULT_UNSTABLE, Base.Order.Perm(Base.Order.Forward, $v));

Very educational:

1.8.5:

  6.947 ms (1 allocation: 16 bytes)                                                                                             
  1.614 ms (0 allocations: 0 bytes)                                                                                             

1.9.0-rc1:

  4.779 ms (2 allocations: 781.30 KiB)                                                                                          
  298.100 μs (0 allocations: 0 bytes)   

But what does it all mean?

The sort! benchmarks don’t set p to 1:length(v) which is a problem. This accounts partially for the speedier times (in this test, the previous sortperm! fixes this, so results stay okay.

The big allocation in 1.9 sortperm! looks like a bug (maybe since fixed). Possibly, setting p to 1:length(v) by allocating a new vector.

1 Like

You are right, I initialize the permutation outside of sort! when I use it.

can you show some example input and the output you want?

are you wanting this?

x = rand(Float64, 5)
y = Int.(1:5)

using SortingLab: sorttwo!

sorttwo!(x, y)

Now x is sorted and y is the result of sortperm(x).

1 Like

Yes, that is really it. Alas, it looks like I will have to teach sortwo! to handle a different input value:

ERROR: LoadError: MethodError: no method matching sorttwo!(::SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}, ::SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true})                                                              
                                                                                                                                
Closest candidates are:                                                                                                         
  sorttwo!(::Vector{T}, ::Any) where T                                                                                          
   @ SortingLab C:\Users\pkonl\SublimeJulia\assets\.julia-1.9.0-rc1-depot\packages\SortingLab\pcZHO\src\sorttwo!.jl:10          
  sorttwo!(::Vector{T}, ::Any, ::Int64) where T                                                                                 
   @ SortingLab C:\Users\pkonl\SublimeJulia\assets\.julia-1.9.0-rc1-depot\packages\SortingLab\pcZHO\src\sorttwo!.jl:10          
  sorttwo!(::Vector{T}, ::Any, ::Int64, ::Int64) where T                                                                        
   @ SortingLab C:\Users\pkonl\SublimeJulia\assets\.julia-1.9.0-rc1-depot\packages\SortingLab\pcZHO\src\sorttwo!.jl:10          
  ...                                                                                                                           
          

I need to pass a view: rows = view(rowval, colptr[c]:colptr[c+1]-1).

There is also the matter of allocations: sorttwo!: it appears to allocate a few arrays?

alot of the methods in SortingLab.jl is memory-layout dependent so things like subarrays may not work well. and also yes, it allocates. I found that allocating makes the algo run faster.

But if you want to modify it it should easy to do sine the algo is quite simple.

I think radixsort has been implemented in 1.9 so maybe that’s why it’s faster there

With the initialization of the permutation:

using BenchmarkTools
const v = rand(100_000);

const p = zeros(Int, 100_000);

@btime sortperm!($p, $v);
# @show p
@btime begin $p .= 1:length($v); sort!($p, Base.Sort.DEFAULT_UNSTABLE, Base.Order.Perm(Base.Order.Forward, $v)); end
# @show p

with 1.9.0-rc1

  4.945 ms (2 allocations: 781.30 KiB)                                                                                          
  4.924 ms (2 allocations: 781.30 KiB)    

and with 1.8.5

  6.796 ms (1 allocation: 16 bytes)                                                                                             
  6.796 ms (0 allocations: 0 bytes) 

It is a pity that the excessive allocation totally kills the performance for the 1.9.0-rc1…

Well, the initialization allocation for p in 1.9 is uncalled for. It seems it is materializing 1:length(v) needlessly. To avoid it just use:

for i in 1:length(v) p[i]=i end

or in the @btime context:

@btime begin for i=1:length($v) @inbounds p[i]=i ; end; sort!($p, Base.Sort.DEFAULT_UNSTABLE, Base.Order.Perm(Base.Order.Forward, $v)); end