# What does partialsortperm! do?

Judging by the docstring, I thought `partialsortperm!` would do the same as `partialsortperm`, just that it writes the permutation vector into the preallocated vector. Let’s see:

``````julia> x = rand(10)
10-element Array{Float64,1}:
0.399221232628445
0.6631806879648021
0.593842927481014
0.20352678125886348
0.5079294454209171
0.6997881492420726
0.6855002855967127
0.437378056707284
0.5171838156407624
0.6341051350791087

julia> index = collect(200:202)
3-element Array{Int64,1}:
200
201
202

julia> partialsortperm!(index, x, 1:3)
3-element view(::Array{Int64,1}, 1:3) with eltype Int64:
1
3
2

julia> partialsortperm(x, 1:3)
3-element view(::Array{Int64,1}, 1:3) with eltype Int64:
4
1
8

julia> x[index]
3-element Array{Float64,1}:
0.399221232628445
0.593842927481014
0.6631806879648021

julia> x[partialsortperm(x, 1:3)]
3-element Array{Float64,1}:
0.20352678125886348
0.399221232628445
0.437378056707284
``````

I’m lost. What is the content of `index` after filling it with `partialsortperm!` anyway?

It’s only considering the first three elements of `x` since your `index` was only three elements long. The order `(1, 3, 2)` puts those first three elements in sorted order.

But then I find the docstring confusing.

`````` partialsortperm(v, k; by=<transform>, lt=<comparison>, rev=false)

Return a partial permutation I of the vector v, so that v[I] returns
values of a fully sorted version of v at index k. If k is a range, a
vector of indices is returned; if k is an integer, a single index is
returned. The order is specified using the same keywords as sort!. The
permutation is stable, meaning that indices of equal elements appear in
ascending order.

Note that this function is equivalent to, but more efficient than,
calling sortperm(...)[k].
``````
`````` partialsortperm!(ix, v, k; by=<transform>, lt=<comparison>, rev=false, initialized=false)

Like partialsortperm, but accepts a preallocated index vector ix. If
initialized is false (the default), ix is initialized to contain the
values 1:length(ix).
``````

`partialsortperm` is said to return the indices of the `k`(th) elements in a fully sorted version of `v`. `partialsortperm!` says it’s "like `partialsortperm`", as I interpret it, writing the result into `ix`.

Agreed, improvements here would be great.

1 Like

I can’t seem to find where `v[1:k]` is sorted only.

``````PartialQuickSort{T <: Union{Int,OrdinalRange}}

Indicate that a sorting function should use the partial quick sort
algorithm. Partial quick sort returns the smallest k elements sorted
from smallest to largest, finding them and sorting them using QuickSort.

Characteristics:

•    not stable: does not preserve the ordering of elements which
compare equal (e.g. "a" and "A" in a sort of letters which
ignores case).

•    in-place in memory.

•    divide-and-conquer: sort strategy similar to MergeSort.
``````

Within `partialsortperm!`, the whole `v` is passed to `sort!` with the `PartialQuickSort{k}` sorting method… whose docstring says it returns the `k` smallest values, not that is sorts a subset of what is passed. I’m starting to be concerned there is a bug here…

To be honest, my first gut reaction was that it was initializing out of bounds leading to strange behavior, but then I realized it actually was doing something semi-reasonable. I’m not sure what the intended behavior is when the index vector isn’t the same length as the value vector.

So apparently this function is very confusing for multiple reasons.

Just to be clear, the name `sortperm!(perm, x)` suggests: sort a permutation of the indices of a vector in-place. Roughly it’s equivalent to `sort!(perm, by = i -> x[i])`.

Now suppose you have a permutation `perm` of the indices of a vector `x`. By definition that means `length(perm) == length(x)` and `isperm(perm)` is true.

In this case, one source of confusion is that `sortperm!(perm, x)` and `partialsortperm!(perm, x, range)` will by default initialize `perm` as the trivial permutation `1:length(perm)`. You have to explicitly signal that `perm` is a valid permutation already using the `initialized = true` kwarg. This behavior is however convenient, because now you do not have to create a dummy permutation yourself, you can just pass in a buffer.

In your example setting `initialized = true` will get you into trouble, since `perm` was not a valid permutation and it will access `x` out of bounds.

Now, the second source of confusion is that because `partialsortperm(x, 1:k)` returns a subset of a permutation of length `k` that orders `x` in the range `1:k`, one should therefore pass a buffer `perm` of size `k` to `partialsortperm!(perm, x, 1:k)`. This is not true! `partialsortperm!(perm, x, 1:k)` partially sorts a full permutation in-place, meaning you have to give pass in a buffer of size `length(x)`.

Finally, the third source of confusion is that `partialsortperm!(perm, x, range, initialized = true)` does not throw whenever `length(perm) != length(x)`. The function still does something when the `perm` you pass is in fact not a permutation of the indices of x. This is maybe an oversight in Base. However, what it does is still sensible as long as you pass in a vector of valid indices. The invariant is then:

``````ix = [1,2,5,7,2,1,2,1,2,1] #notice length(x) != length(ix) and isperm(ix) == false
x = collect(10:-1:1)
k = 1:3
@test x[partialsortperm!(ix, x, k, initialized=true)] == sort(x[ix])[k]
``````

So what is does is, it partially sorts a vector of indices.

Right now I think it would make sense to rename the functions to `sortindices!` and `partialsortindices!` and make `initialized = true` the default.

Or have both `(partial)sortindices!` and `(partial)sortperm!`, and make the latter more strict when it comes to the input. However, as far as I know, these function do not exploit the fact that a permutation is being passed to the function.

1 Like

I think my application was to find the k nearest neighbors of each data point among all data points (for some reasons, I couldn’t use the tree stuff from NearestNeighbors.jl). So I’ve got a huge distance matrix, and now I go through each row (or column), and want to know the indices of those k points with smallest distance. This is exactly what `partialsortperm` gives me, right? Now, because I save the results elsewhere, as I go through the rows, I would like to store the result in each step in a preallocated vector. Apparently, in contrast to the docstring, this is not what `partialsortperm!` does, right? Then do we have a function that does what the docstring promises?

It would be really great if someone who knows how it works could rectify the docstring, and I agree with https://github.com/JuliaLang/julia/issues/33184 that the `!`-vs-`non-!` correspondence seems broken here. At least compared to how I understand the `!`-convention.

This is exactly what `partialsortperm` gives me, right?

Well, in a sense, but in fact it returns something bigger, namely a view of a permutation vector. This permutation vector is the thing that is mutated in `partialsortperm!`. Suppose `partialsortperm` returned a full, partially sorted permutation, then the `!` version would be natural, right?

So, it “sorts” the entire vector (or the corresponding indices) until it gets the `1:k` indices right, and returns a partial view of the entire index vector? Oh man, how much I misunderstood that function…!

1 Like

Yes

So your code would look like

``````function example(n, k)
dist = rand(n, n)
topk = Matrix{Int}(undef, k, n)
perm = Vector{Int}(undef, n)
for i = 1 : n
topk[:, i] .= partialsortperm!(perm, view(dist, :, i), 1:k)
end
dist, topk
end
``````

That allocates O((k+1)n) whereas non-mutating `partialsortperm` version would allocate O(n^2)

1 Like