What does partialsortperm! do?

Judging by the docstring, I thought partialsortperm! would do the same as partialsortperm, just that it writes the permutation vector into the preallocated vector. Let’s see:

julia> x = rand(10)
10-element Array{Float64,1}:
 0.399221232628445  
 0.6631806879648021 
 0.593842927481014  
 0.20352678125886348
 0.5079294454209171 
 0.6997881492420726 
 0.6855002855967127 
 0.437378056707284  
 0.5171838156407624 
 0.6341051350791087 

julia> index = collect(200:202)
3-element Array{Int64,1}:
 200
 201
 202

julia> partialsortperm!(index, x, 1:3)
3-element view(::Array{Int64,1}, 1:3) with eltype Int64:
 1
 3
 2

julia> partialsortperm(x, 1:3)
3-element view(::Array{Int64,1}, 1:3) with eltype Int64:
 4
 1
 8

julia> x[index]
3-element Array{Float64,1}:
 0.399221232628445 
 0.593842927481014 
 0.6631806879648021

julia> x[partialsortperm(x, 1:3)]
3-element Array{Float64,1}:
 0.20352678125886348
 0.399221232628445  
 0.437378056707284  

I’m lost. What is the content of index after filling it with partialsortperm! anyway?

It’s only considering the first three elements of x since your index was only three elements long. The order (1, 3, 2) puts those first three elements in sorted order.

But then I find the docstring confusing.

 partialsortperm(v, k; by=<transform>, lt=<comparison>, rev=false)

  Return a partial permutation I of the vector v, so that v[I] returns
  values of a fully sorted version of v at index k. If k is a range, a
  vector of indices is returned; if k is an integer, a single index is
  returned. The order is specified using the same keywords as sort!. The
  permutation is stable, meaning that indices of equal elements appear in
  ascending order.

  Note that this function is equivalent to, but more efficient than,
  calling sortperm(...)[k].
 partialsortperm!(ix, v, k; by=<transform>, lt=<comparison>, rev=false, initialized=false)

  Like partialsortperm, but accepts a preallocated index vector ix. If
  initialized is false (the default), ix is initialized to contain the
  values 1:length(ix).

partialsortperm is said to return the indices of the k(th) elements in a fully sorted version of v. partialsortperm! says it’s “like partialsortperm”, as I interpret it, writing the result into ix.

Agreed, improvements here would be great.

1 Like

I can’t seem to find where v[1:k] is sorted only.

PartialQuickSort{T <: Union{Int,OrdinalRange}}

  Indicate that a sorting function should use the partial quick sort
  algorithm. Partial quick sort returns the smallest k elements sorted
  from smallest to largest, finding them and sorting them using QuickSort.

  Characteristics:

    •    not stable: does not preserve the ordering of elements which
        compare equal (e.g. "a" and "A" in a sort of letters which
        ignores case).

    •    in-place in memory.

    •    divide-and-conquer: sort strategy similar to MergeSort.

Within partialsortperm!, the whole v is passed to sort! with the PartialQuickSort{k} sorting method… whose docstring says it returns the k smallest values, not that is sorts a subset of what is passed. I’m starting to be concerned there is a bug here…

To be honest, my first gut reaction was that it was initializing out of bounds leading to strange behavior, but then I realized it actually was doing something semi-reasonable. I’m not sure what the intended behavior is when the index vector isn’t the same length as the value vector.

So apparently this function is very confusing for multiple reasons.

Just to be clear, the name sortperm!(perm, x) suggests: sort a permutation of the indices of a vector in-place. Roughly it’s equivalent to sort!(perm, by = i -> x[i]).

Now suppose you have a permutation perm of the indices of a vector x. By definition that means length(perm) == length(x) and isperm(perm) is true.

In this case, one source of confusion is that sortperm!(perm, x) and partialsortperm!(perm, x, range) will by default initialize perm as the trivial permutation 1:length(perm). You have to explicitly signal that perm is a valid permutation already using the initialized = true kwarg. This behavior is however convenient, because now you do not have to create a dummy permutation yourself, you can just pass in a buffer.

In your example setting initialized = true will get you into trouble, since perm was not a valid permutation and it will access x out of bounds.

Now, the second source of confusion is that because partialsortperm(x, 1:k) returns a subset of a permutation of length k that orders x in the range 1:k, one should therefore pass a buffer perm of size k to partialsortperm!(perm, x, 1:k). This is not true! partialsortperm!(perm, x, 1:k) partially sorts a full permutation in-place, meaning you have to give pass in a buffer of size length(x).

Finally, the third source of confusion is that partialsortperm!(perm, x, range, initialized = true) does not throw whenever length(perm) != length(x). The function still does something when the perm you pass is in fact not a permutation of the indices of x. This is maybe an oversight in Base. However, what it does is still sensible as long as you pass in a vector of valid indices. The invariant is then:

ix = [1,2,5,7,2,1,2,1,2,1] #notice length(x) != length(ix) and isperm(ix) == false
x = collect(10:-1:1)
k = 1:3
@test x[partialsortperm!(ix, x, k, initialized=true)] == sort(x[ix])[k]

So what is does is, it partially sorts a vector of indices.


Right now I think it would make sense to rename the functions to sortindices! and partialsortindices! and make initialized = true the default.

Or have both (partial)sortindices! and (partial)sortperm!, and make the latter more strict when it comes to the input. However, as far as I know, these function do not exploit the fact that a permutation is being passed to the function.

1 Like

I think my application was to find the k nearest neighbors of each data point among all data points (for some reasons, I couldn’t use the tree stuff from NearestNeighbors.jl). So I’ve got a huge distance matrix, and now I go through each row (or column), and want to know the indices of those k points with smallest distance. This is exactly what partialsortperm gives me, right? Now, because I save the results elsewhere, as I go through the rows, I would like to store the result in each step in a preallocated vector. Apparently, in contrast to the docstring, this is not what partialsortperm! does, right? Then do we have a function that does what the docstring promises?

It would be really great if someone who knows how it works could rectify the docstring, and I agree with https://github.com/JuliaLang/julia/issues/33184 that the !-vs-non-! correspondence seems broken here. At least compared to how I understand the !-convention.

This is exactly what partialsortperm gives me, right?

Well, in a sense, but in fact it returns something bigger, namely a view of a permutation vector. This permutation vector is the thing that is mutated in partialsortperm!. Suppose partialsortperm returned a full, partially sorted permutation, then the ! version would be natural, right?

So, it “sorts” the entire vector (or the corresponding indices) until it gets the 1:k indices right, and returns a partial view of the entire index vector? Oh man, how much I misunderstood that function…!

1 Like

Yes

So your code would look like

function example(n, k)
    dist = rand(n, n)
    topk = Matrix{Int}(undef, k, n)
    perm = Vector{Int}(undef, n)
    for i = 1 : n
        topk[:, i] .= partialsortperm!(perm, view(dist, :, i), 1:k)
    end
    dist, topk
end

That allocates O((k+1)n) whereas non-mutating partialsortperm version would allocate O(n^2)

1 Like