Any sorting function in Julia that is as fast as Matlab's maxk?

I found that maxk in Matlab is way faster(4.7 times in Matlab 2018a) than the maxk in Julia.

QQ=randn(1000,1);
tic;
for i=1:10000
[~,ind]=maxk(QQ,10);
end
toc;

and

tic();
for i=1:10000
ind1=partialsortperm(Q, 1:10, rev=true);
end
toc();

Is it possible for the maxk in Julia to be as fast as that in Matlab?

How large is your array? My SortingLab.jl has a fast sortperm (fsortperm) for sorting integers but not floats. It’s not hard to adapt the algorithm for floats as follows. The issue with fsortperm is that it doesn’t know to stop once it has found the top k values but instead returns everything first so it may not be as efficient.

# uint_mapping stolen from SortingAlgorithms.jl
uint_mapping(x::Float64)  = (y = reinterpret(Int64, x); reinterpret(UInt64, ifelse(y < 0, ~y, xor(y, typemin(Int64)))))

maxk(Q,k) = begin
    Qunit = uint_mapping.(Q)
    res = SortingLab.fsortperm(Qunit)
    Q[res[end-(k-1):end]]
end

Q = randn(10000)
@benchmark maxk($Q, 10)

those are the results from Juliabox (v0.6.2), so I think it’s pretty good for such a small size Q (length = 1000)

BenchmarkTools.Trial: 
  memory estimate:  133.78 KiB
  allocs estimate:  34
  --------------
  minimum time:     97.704 μs (0.00% GC)
  median time:      115.204 μs (0.00% GC)
  mean time:        139.580 μs (5.07% GC)
  maximum time:     100.319 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Also the below is a better way to benchmark

using BenchmarkTools
@benchmark ind1=partialsortperm($Q, 1:10, rev=true);

would be a better way

Please do not post the same topic in two different threads. You’ve already opened this exact issue in your other thread here: What is Julia’s maxk(Matlab) that returns the indice of top K largest values?

Duplicating a topic makes it harder for us to help you: people end up duplicating each other’s responses and the correct answer gets divided up into multiple places.

2 Likes

From a person who’s new to this forum it might be helpful explaining how they are the exact same issue. As from that perspective the earlier post is about “how”? And this is post is about “best”, as fast as Matlabs. But I think the OP is asking the same thing because the connotation is that the original questions answers the “how” but isn’t fast enough, and he could’ve just changed his original title to a “how and best” question.

I guess @rdeits isn’t trying to sound harsher than he should but beginners like @complexfilter should learn the rules to make this forum the most productive that it can be.

Indeed, and I apologize for being harsh. I just meant that this post: What is Julia’s maxk(Matlab) that returns the indice of top K largest values? which asks “Is it possible for the maxk in Julia to be as fast as that in Matlab?” and the current topic, which asks “Is it possible for the maxk in Julia to be as fast as that in Matlab?” are asking precisely the same question.

@complexfilter I don’t mean to suggest that your questions are not welcome here. Quite the opposite–the whole point of this forum is for you to be able to ask questions. But by keeping the discussion of a given topic in a single Discourse thread you can make it easier for us to give better answers and be more helpful.

3 Likes

The benchmarks are impressive. Do you have any sortperm function for two or more vectors? This would be necessary for breaking ties in the first vector, etc.

Sorry about asking the same question. I will keep your suggestion in mind.

I believe I do but it’s not very generalised so it only works on certain combinations, and I don’t have time to work on it so no help here. The issue is that when sorting two vectors (or sorting n vectors) every combination of vector types should result in specialized code from Julia’s compiler e.g. suppose v1 and v1 are two vectors so fsortperm(v1, v2) should generate different code for v1 and v2 both Int32 vs v1 being String and v2 being Float64 and so on. I think they can be achieve for 2 vectors but seems hard for a generalised n, potentially because I don’t know Julia well enough and I have feeling that generated functions might help.