What is Julia's maxk(Matlab) that returns the indice of top K largest values?

I am currently converting my Matlab code into Julia code. In an effort to converting the Matlab off-the-shelf function maxk, which returns the values and indices of the top K largest/smallest values, I found it difficult to find such function in Julia.

1 Like

There may be something more efficient, but `sortperm` should do what you want:

``````julia> a = shuffle(11:15)
5-element Array{Int64,1}:
15
12
14
13
11

julia> b = sortperm(a)
5-element Array{Int64,1}:
5
2
4
3
1

julia> collect(zip(b, a[b]))
5-element Array{Tuple{Int64,Int64},1}:
(5, 11)
(2, 12)
(4, 13)
(3, 14)
(1, 15)
``````
4 Likes

As above, you can also use `partialsortperm(a, 1:3)` to only get the first 3 indices, and so on. It should run a little faster. Here’s a link to the documentation, and an implementation of `maxk`. Note I use `rev=true` to get the largest elements.

``````function maxk(a, k)
b = partialsortperm(a, 1:k, rev=true)
return collect(zip(b, a[b]))
end
``````
5 Likes

Thanks for the code. However, I found that maxk in Matlab is way faster(4.7 times in Matlab 2018a) than the maxk in Julia.

``````QQ=randn(1000,1);
tic;
for i=1:10000
[~,ind]=maxk(QQ,10);
end
toc;
``````

and

``````tic();
for i=1:10000
ind1=maxk(Q,10);
end
toc();
``````

Is it possible for the maxk in Julia to be as fast as that in Matlab?

Have a look at the performance tips. Here is how I would time:

``````julia> using BenchmarkTools

julia> a = randn(1000);

julia> @btime maxk(\$a, 10);
11.064 μs (8 allocations: 8.47 KiB)
``````

I see a >30x improvement when I use `partialsortperm!` (an in-place version of `partialsortperm`) and I pass in a pre-initialized vector `ix=collect(1:10)`.

``````julia> function maxk!(ix, a, k; initialized=false)
partialsortperm!(ix, a, 1:k, rev=true, initialized=initialized)
return collect(zip(ix, a[ix]))
end
maxk! (generic function with 1 method)

julia> ix = collect(1:10);

julia> @btime maxk!(\$ix, \$a, 10, true);
345.111 ns (7 allocations: 544 bytes)
``````

Edit: a more general `maxk!` where `ix` may not necessarily be initialized.
Editx2: See @Elrod below

I don’t think `maxk!` is correct.

``````julia> function maxk!(ix, a, k; initialized=false)
partialsortperm!(ix, a, 1:k, rev=true, initialized=initialized)
return collect(zip(ix, a[ix]))
end
maxk! (generic function with 1 method)

julia> q = randn(1000);

julia> idx = collect(1:10);

julia> maxk!(idx, q, 10, initialized = true)
10-element Array{Tuple{Int64,Float64},1}:
(4, 0.20337986959767532)
(6, -0.3077570887074856)
(1, -0.6107099712370375)
(5, -0.6993273979405977)
(10, -0.7794269061061618)
(2, -0.8531978408910773)
(7, -0.9704489858497608)
(8, -1.1237782702811618)
(9, -1.2057372641871114)
(3, -2.813777795383793)

julia> maximum(q)
4.032311634752638

julia> idx'
4  6  1  5  10  2  7  8  9  3

julia> partialsortperm(q, 1:10, rev=true)'
626  425  940  40  594  811  88  133  278  128

julia> q[ans]
1×10 Array{Float64,2}:
4.03231  3.60551  3.07742  2.75559  2.69597  2.45856  2.25731  2.2014  2.19445  2.18963
``````

EDIT:
You need:

``````function maxk!(ix, a, k; initialized=false)
partialsortperm!(ix, a, 1:k, rev=true, initialized=initialized)
@views collect(zip(ix[1:k], a[ix[1:k]]))
end

julia> idx = collect(1:length(q));

julia> maxk!(idx, q, 10, initialized = true)
1000-element Array{Tuple{Int64,Float64},1}:
(626, 4.032311634752638)
(425, 3.6055071635748828)
(940, 3.07741625532675)
(40, 2.755589275585265)
(594, 2.695973149935006)
(811, 2.4585577302387667)
(88, 2.2573122866282973)
(133, 2.2014028244112733)
(278, 2.1944512712785187)
(128, 2.1896301246037284)
``````

EDIT:
Because this was marked the solution, I want to point out that it isn’t notably faster than the other answers on the computer I tried this on.

``````julia> @benchmark maxk!(\$idx, \$q, 10)# setup=(copyto!(\$idx, 1:length(\$q)))
BenchmarkTools.Trial:
memory estimate:  544 bytes
allocs estimate:  10
--------------
minimum time:     9.098 μs (0.00% GC)
median time:      9.257 μs (0.00% GC)
mean time:        9.671 μs (0.00% GC)
maximum time:     26.930 μs (0.00% GC)
--------------
samples:          10000
evals/sample:     1

julia> b = @benchmarkable maxk!(\$idx, \$q, 10, initialized=true) setup=(copyto!(\$idx, 1:length(\$q)))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(b)
BenchmarkTools.Trial:
memory estimate:  544 bytes
allocs estimate:  10
--------------
minimum time:     8.966 μs (0.00% GC)
median time:      9.127 μs (0.00% GC)
mean time:        9.440 μs (0.00% GC)
maximum time:     21.891 μs (0.00% GC)
--------------
samples:          10000
evals/sample:     1

julia> function maxk(a, k)
b = partialsortperm(a, 1:k, rev=true)
return collect(zip(b, a[b]))
end
maxk (generic function with 1 method)

julia> @benchmark maxk(\$q, 10)
BenchmarkTools.Trial:
memory estimate:  8.47 KiB
allocs estimate:  8
--------------
minimum time:     9.749 μs (0.00% GC)
median time:      11.382 μs (0.00% GC)
mean time:        16.790 μs (31.13% GC)
maximum time:     42.964 ms (99.93% GC)
--------------
samples:          10000
evals/sample:     1
``````
3 Likes

You should have used

collect(1:1000)

as a fair comparison

A comparison between `maxk` (the not-in-place version) in Julia 1.0 and `maxk` in Matlab R2018a shows that Julia is several times faster two times slower for this particular test on my laptop.

Julia 1.0.0:

``````using BenchmarkTools

function maxk(a, k)
b = partialsortperm(a, 1:k, rev=true)
return collect(zip(b, a[b]))
end

a = randn(1000)
k = 10

@benchmark maxk(\$a,\$k)
``````
``````BenchmarkTools.Trial:
memory estimate:  8.47 KiB
allocs estimate:  8
--------------
minimum time:     9.079 μs (0.00% GC)
median time:      9.475 μs (0.00% GC)
mean time:        16.670 μs (38.25% GC)
maximum time:     50.874 ms (99.91% GC)
--------------
samples:          10000
evals/sample:     1
``````

Matlab R2018a:

``````a = randn(1000,1);
k = 10;
f = @() maxk(a,k);

tmedian = timeit(f)
``````
``````tmedian =
4.3039e-06
``````

Edit: Fixed wrong comparison (it was too late at night…).

Looks like the Matlab version is >2x faster to me.

Yes, looking at previous discussions it seems like Matlab’s sort functions are multi-threaded and heavily optimized, so they will be tough to beat. I did a naive threaded implementation here, which starts to have some gains over single-threaded for large `a`:

``````julia> using Base.Threads

2

end
partialsortperm!(ix, a, 1:k, rev=true, initialized=true)
@views collect(zip(ix[1:k], a[ix[1:k]]))
end
maxk_threaded (generic function with 1 method)

julia> a = randn(10000);

julia> @btime maxk(\$a, 10)
70.201 μs (9 allocations: 78.73 KiB)
10-element Array{Tuple{Int64,Float64},1}:
(3840, 3.7524106800393)
(1359, 3.667162944745407)
(4738, 3.46128657912246)
(8532, 3.349067643815953)
(8314, 3.3363898988561234)
(3542, 3.3297030468239965)
(1159, 3.2795246783768923)
(9436, 3.259918244413647)
(9418, 3.254388944717796)
(2198, 3.155524296051535)

53.894 μs (23 allocations: 1.58 KiB)
10-element Array{Tuple{Int64,Float64},1}:
(3840, 3.7524106800393)
(1359, 3.667162944745407)
(4738, 3.46128657912246)
(8532, 3.349067643815953)
(8314, 3.3363898988561234)
(3542, 3.3297030468239965)
(1159, 3.2795246783768923)
(9436, 3.259918244413647)
(9418, 3.254388944717796)
(2198, 3.155524296051535)
``````

Got a reasonably fast solution here using the SortingLab.jl, see

I am not sure how fast it is compared to Matlab. Perhaps @complexfilter can do a test and share the results.