How to return k largest elements of a vector in descending order?

Let’s say I have a vector

a = rand(1:20, 10)
10-element Vector{Int64}:
  2
 13
 10
 12
 10
  8
 10
 14
  1
 19

I want to define a function such as f() which takes two inputs including a vector a and k as the number of largest elements and return the k largest numbers in descending order.
For example:

f(a,3) = [19, 14 , 13] 

A similar issue is discussed here in 2018, but it looks like some functions like partialsortperm() do not work in the newer versions of Julia. I use Julia 1.6.5.

Not sure what you mean.

julia> a[partialsortperm(a, 1:3; rev = true)]
3-element Vector{Int64}:
 20
 18
 17

works just fine on Julia 1.6.

Hi,

does the following function do the trick?

f(a,k) = sort(a, rev=true)[1:k]

Regards,

Thomas

1 Like

partialsortperm is made for these situations, so you might be leaving quite a bit of performance on the table if you sort the whole vector:

julia> f1(a, k) = sort(a; rev = true)[1:k];

julia> f2(a, k) = a[partialsortperm(a, 1:k; rev = true)];

julia> using BenchmarkTools

julia> x = rand(1_000_000);

julia> @btime f1($x, 3);
  53.254 ms (3 allocations: 7.63 MiB)

julia> x = rand(1_000_000);

julia> @btime f2($x, 3);
  5.330 ms (5 allocations: 7.63 MiB)

an order of magnitude on my machine for 1e6 Float64 elements.

5 Likes

partialsort(x, 1:k; rev=true) is even faster, perm is not needed here.

5 Likes

by @stevengj

the DataStructures.jl package already contains such a function , implemented using heaps.

Just call nlargest(n, array) from DataStructures.jl

from this discussion

3 Likes

To sum it up…

julia> using DataStructures, BenchmarkTools

julia> f1(a, k) = sort(a; rev = true)[1:k];

julia> f2(a, k) = a[partialsortperm(a, 1:k; rev = true)];

julia> f3(a, k) = partialsort(a, 1:k; rev=true)
f3 (generic function with 1 method)

julia> f4(a, k) = nlargest(k, a)
f4 (generic function with 1 method)

julia> x = rand(1_000_000);

julia> xb = copy(x);

julia> r1 = @btime f1($xb, 3);
  74.432 ms (3 allocations: 7.63 MiB)

julia> xb = copy(x);

julia> r2 = @btime f2($xb, 3);
  13.480 ms (5 allocations: 7.63 MiB)

julia> xb = copy(x);

julia> r3 = @btime f3($xb, 3);
  9.408 ms (2 allocations: 7.63 MiB)

julia> xb = copy(x);

julia> r4 = @btime f4($xb, 3);
  1.531 ms (2 allocations: 160 bytes)

julia> r1 == r2 == r3 == r4
true
3 Likes

try also this, but for larger arrays



function MaxN(cr,N)
    maxn = heapify!(cr[1:N])
    maxn1=maxn[1]
       @inbounds for i in N+1:length(cr)
        e=cr[i]    
        if maxn1 < e
            heappop!(maxn)
            heappush!(maxn,e)
            maxn1=maxn[1]
            end
        end
    sort!(maxn,rev=true)
end

Why doesn’t partialsort use nlargest then? What’s the tradeoff?

Because nlargest discards the rest of the array? The two functions aren’t equivalent.

1 Like

DataStructures.nlargest already uses a heap (in a slightly more efficient way because it can combine the push and pop).