# How to return k largest elements of a vector in descending order?

Let’s say I have a vector

``````a = rand(1:20, 10)
10-element Vector{Int64}:
2
13
10
12
10
8
10
14
1
19
``````

I want to define a function such as f() which takes two inputs including a vector `a` and `k` as the number of largest elements and return the k largest numbers in descending order.
For example:

``````f(a,3) = [19, 14 , 13]
``````

A similar issue is discussed here in 2018, but it looks like some functions like `partialsortperm()` do not work in the newer versions of Julia. I use Julia 1.6.5.

Not sure what you mean.

``````julia> a[partialsortperm(a, 1:3; rev = true)]
3-element Vector{Int64}:
20
18
17
``````

works just fine on Julia 1.6.

Hi,

does the following function do the trick?

``````f(a,k) = sort(a, rev=true)[1:k]
``````

Regards,

Thomas

1 Like

`partialsortperm` is made for these situations, so you might be leaving quite a bit of performance on the table if you sort the whole vector:

``````julia> f1(a, k) = sort(a; rev = true)[1:k];

julia> f2(a, k) = a[partialsortperm(a, 1:k; rev = true)];

julia> using BenchmarkTools

julia> x = rand(1_000_000);

julia> @btime f1(\$x, 3);
53.254 ms (3 allocations: 7.63 MiB)

julia> x = rand(1_000_000);

julia> @btime f2(\$x, 3);
5.330 ms (5 allocations: 7.63 MiB)
``````

an order of magnitude on my machine for 1e6 Float64 elements.

5 Likes

`partialsort(x, 1:k; rev=true)` is even faster, `perm` is not needed here.

5 Likes

the DataStructures.jl package already contains such a function , implemented using heaps.

Just call `nlargest(n, array)` from DataStructures.jl

from this discussion

3 Likes

To sum it up…

``````julia> using DataStructures, BenchmarkTools

julia> f1(a, k) = sort(a; rev = true)[1:k];

julia> f2(a, k) = a[partialsortperm(a, 1:k; rev = true)];

julia> f3(a, k) = partialsort(a, 1:k; rev=true)
f3 (generic function with 1 method)

julia> f4(a, k) = nlargest(k, a)
f4 (generic function with 1 method)

julia> x = rand(1_000_000);

julia> xb = copy(x);

julia> r1 = @btime f1(\$xb, 3);
74.432 ms (3 allocations: 7.63 MiB)

julia> xb = copy(x);

julia> r2 = @btime f2(\$xb, 3);
13.480 ms (5 allocations: 7.63 MiB)

julia> xb = copy(x);

julia> r3 = @btime f3(\$xb, 3);
9.408 ms (2 allocations: 7.63 MiB)

julia> xb = copy(x);

julia> r4 = @btime f4(\$xb, 3);
1.531 ms (2 allocations: 160 bytes)

julia> r1 == r2 == r3 == r4
true
``````
3 Likes

try also this, but for larger arrays

``````

function MaxN(cr,N)
maxn = heapify!(cr[1:N])
maxn1=maxn[1]
@inbounds for i in N+1:length(cr)
e=cr[i]
if maxn1 < e
heappop!(maxn)
heappush!(maxn,e)
maxn1=maxn[1]
end
end
sort!(maxn,rev=true)
end
``````

Why doesn’t `partialsort` use `nlargest` then? What’s the tradeoff?

Because `nlargest` discards the rest of the array? The two functions aren’t equivalent.

1 Like

`DataStructures.nlargest` already uses a heap (in a slightly more efficient way because it can combine the push and pop).