Let’s say I have a vector
a = rand(1:20, 10)
10-element Vector{Int64}:
2
13
10
12
10
8
10
14
1
19
I want to define a function such as f() which takes two inputs including a vector a
and k
as the number of largest elements and return the k largest numbers in descending order.
For example:
f(a,3) = [19, 14 , 13]
A similar issue is discussed here in 2018, but it looks like some functions like partialsortperm()
do not work in the newer versions of Julia. I use Julia 1.6.5.
Not sure what you mean.
julia> a[partialsortperm(a, 1:3; rev = true)]
3-element Vector{Int64}:
20
18
17
works just fine on Julia 1.6.
Hi,
does the following function do the trick?
f(a,k) = sort(a, rev=true)[1:k]
Regards,
Thomas
1 Like
nilshg
October 31, 2022, 1:15pm
4
partialsortperm
is made for these situations, so you might be leaving quite a bit of performance on the table if you sort the whole vector:
julia> f1(a, k) = sort(a; rev = true)[1:k];
julia> f2(a, k) = a[partialsortperm(a, 1:k; rev = true)];
julia> using BenchmarkTools
julia> x = rand(1_000_000);
julia> @btime f1($x, 3);
53.254 ms (3 allocations: 7.63 MiB)
julia> x = rand(1_000_000);
julia> @btime f2($x, 3);
5.330 ms (5 allocations: 7.63 MiB)
an order of magnitude on my machine for 1e6 Float64 elements.
5 Likes
partialsort(x, 1:k; rev=true)
is even faster, perm
is not needed here.
5 Likes
To sum it up…
julia> using DataStructures, BenchmarkTools
julia> f1(a, k) = sort(a; rev = true)[1:k];
julia> f2(a, k) = a[partialsortperm(a, 1:k; rev = true)];
julia> f3(a, k) = partialsort(a, 1:k; rev=true)
f3 (generic function with 1 method)
julia> f4(a, k) = nlargest(k, a)
f4 (generic function with 1 method)
julia> x = rand(1_000_000);
julia> xb = copy(x);
julia> r1 = @btime f1($xb, 3);
74.432 ms (3 allocations: 7.63 MiB)
julia> xb = copy(x);
julia> r2 = @btime f2($xb, 3);
13.480 ms (5 allocations: 7.63 MiB)
julia> xb = copy(x);
julia> r3 = @btime f3($xb, 3);
9.408 ms (2 allocations: 7.63 MiB)
julia> xb = copy(x);
julia> r4 = @btime f4($xb, 3);
1.531 ms (2 allocations: 160 bytes)
julia> r1 == r2 == r3 == r4
true
3 Likes
try also this, but for larger arrays
function MaxN(cr,N)
maxn = heapify!(cr[1:N])
maxn1=maxn[1]
@inbounds for i in N+1:length(cr)
e=cr[i]
if maxn1 < e
heappop!(maxn)
heappush!(maxn,e)
maxn1=maxn[1]
end
end
sort!(maxn,rev=true)
end
DNF
October 31, 2022, 10:38pm
9
Why doesn’t partialsort
use nlargest
then? What’s the tradeoff?
Because nlargest
discards the rest of the array? The two functions aren’t equivalent.
1 Like
DataStructures.nlargest
already uses a heap (in a slightly more efficient way because it can combine the push and pop).