Fast permutation vector: code needed

Sorry, I don’t follow: there are no allocations in 1.8.5 for @btime begin $p .= 1:length($v); end.

Yeah, and there shouldn’t be. p .= 1:length(v) calls for no allocation. Why 1.9 seems to do it, is a mystery. But test 1.9 with the for loop initialization.

And there are none in 1.9.0-rc1 either .

These are allocations of the whole vector in 1.9

Well, I did:

julia> module mt001                                                                                                             
       using BenchmarkTools                                                                                                     
       const v = rand(100_000);                                                                                                 
                                                                                                                                
       const p = zeros(Int, 100_000);                                                                                           
                                                                                                                                
       @btime sortperm!($p, $v);                                                                                                
       # @show p                                                                                                                
       @btime begin $p .= 1:length($v); sort!($p, Base.Sort.DEFAULT_UNSTABLE, Base.Order.Perm(Base.Order.Forward, $v)); end     
       @btime begin $p .= 1:length($v);  end                                                                                    
       nothing                                                                                                                  
       end                                                                                                                      
WARNING: replacing module mt001.                                                                                                
  4.625 ms (2 allocations: 781.30 KiB)                                                                                          
  4.658 ms (2 allocations: 781.30 KiB)                                                                                          
  16.100 μs (0 allocations: 0 bytes)                                                                                            
Main.mt001  

Something is allocating the whole vector. Maybe sort!. But it shouldn’t.

I think I’ll need to install 1.9 already. But first I’ll look at source on github for 1.9rc1

1 Like

Yup, not just once, but twice…

Looking at 1.9rc1, the sorting functions are changed. Try to add another named parameter scratch which is a preallocated area of equal size to array i.e.:

const s = similar(p)
....
sort!(.... ; scratch=s)

and benchmark with this setup. I have a feeling it would be a new top speed. Same parameter should apply to sortperm! (which is preferable).

Good thinking!

julia> module mt001                                                                                                             
       using BenchmarkTools                                                                                                     
       const v = rand(100_000);                                                                                                 
                                                                                                                                
       const p = zeros(Int, 100_000);                                                                                           
       const s = similar(p)                                                                                                     
                                                                                                                                
       @btime sortperm!($p, $v; scratch=$s);                                                                                    
       # @show p                                                                                                                
       @btime begin $p .= 1:length($v); sort!($p, Base.Sort.DEFAULT_UNSTABLE, Base.Order.Perm(Base.Order.Forward, $v)); end     
       @btime begin $p .= 1:length($v);  end                                                                                    
       nothing                                                                                                                  
       end                                                                                                                      
WARNING: replacing module mt001.                                                                                                
  4.555 ms (0 allocations: 0 bytes)                                                                                             
  4.645 ms (2 allocations: 781.30 KiB)                                                                                          
  16.000 μs (0 allocations: 0 bytes)                                                                                            
Main.mt001  

Great. sortperm! should be the one to use. But you can also add the scratch parameter to sort! benchmark, and it should take care of allocations too.

No, apparently it doesn’t go into sort!

Yes it is a parameter of sort! (forget nonsense about being positional, hard to read semicolons):

function sort!(v::AbstractVector{T};
               alg::Algorithm=defalg(v),
               lt=isless,
               by=identity,
               rev::Union{Bool,Nothing}=nothing,
               order::Ordering=Forward,
               scratch::Union{Vector{T}, Nothing}=nothing) where T

I think mine is a different sort!.

Yeah, there is a bit of a mess in the function signatures. sortperm! should be the go-to function for this task. But try to add another positional parameter of a named-tuple as follows:

sort!($p,
  Base.Sort.DEFAULT_UNSTABLE,
  Base.Order.Perm(Base.Order.Forward, $v), (;scratch=s)

At risk of being annoying, a point of style:

If you wrap p[i]=i in parentheses you’ll get a syntax error. Similar weirdness happens if the statement body is a symbol, : quoted expression, or array. Example:

julia> if true [1:5;] end
ERROR: syntax: space before "[" not allowed in "true [" at REPL[1]:1

I’ve adopted a style of adding a semicolon ; after the condition for one-liner if, while, and for statements to avoid syntax surprises.

Thanks. Not annoying at all. The semicolon ; is something I looked for, because the statement interacted badly with @btime.

julia> @btime for i in 1:length($v); $p[i]=i end

is what was required.

1 Like

Ah, that’s a new one for me—interaction with @btime interpolation! The error is quite unintuitive :sweat_smile:

that’s also the consequence of radixsort which requires double the memory. for less allocation, you can try to implement the ideas in sorttwo! but using something like quicksort but i bet it will be slower since that’s O(nlog(n)) algorithm but radixsort is O(n)

And for my application the winners are:

sort!(prm; alg=Base.Sort.QuickSort, order=Base.Order.Perm(Base.Order.Forward, rows), scratch=scratcha);

on 1.9.0-rc1, and

sort!(prm, Base.Sort.QuickSort, Base.Order.Perm(Base.Order.Forward, rows))

on 1.8.5. (This version is marginally faster than 1.9.0-rc1.)

sortperm!(prm, rows; scratch=scratcha) was noticeably slower on 1.9.0-rc1 than the
code fragment above.