# Performance tips for a function which finds n maximums in a matrix

Hi everyone,

I have been trying to squeeze out as much performance as possible out from the following function, which finds the n largest (or maximum) values of a matrix and returns those values and their positions in the matrix, however I do have to note that in the specific case in which I am calling the function I really don’t need the values, just the indices:

``````function maximums(matrix, n)::Tuple{Vector{Int64}, Vector{CartesianIndex{2}}}
type = eltype(matrix)
vals = zeros(type, n)
indices = Array{Int64}(undef,n)
arr = Array{Tuple{type,CartesianIndex}}(undef, n)
@inbounds for i ∈ axes(matrix, 1), j ∈ axes(matrix, 2)
smallest, index = findmin(vals)
if matrix[i, j] > smallest
arr[index] = matrix[i, j], CartesianIndex(i, j)
vals[index] = matrix[i, j]
end
end
arr = sort(arr, by=x -> x, rev=true)
vals = [x for x in arr]
indices = [x for x in arr]
return vals, indices
end
``````

I have already profiled it using @code_warntype and pprof and I can’t seem to find any possible room for improvement. What can I do to make it run faster? Or does my logic need a rewrite from the ground up?

I call this function thousands of times in my main function so I would really appreciate any speed up.

Thank you very much!

You can make this thousands of times faster (look at `partialsort` and `partialsortperm`)

3 Likes

Specifically:

``````function maximums2(M, n)
v = vec(M)
l = length(v)
perm = partialsortperm(v, (l-n+1):l)
vals = v[perm]
indices = CartesianIndices(M)[perm]
return vals, indices
end

M = rand(Int, 1000,1000);

@time maximums(M, 10000);
8.694805 seconds (421.12 k allocations: 7.938 MiB)

@time maximums2(M, 10000);
0.010756 seconds (13 allocations: 15.488 MiB)
``````

Specifically if `l` is the number of elements in the matrix, `maximums` is `O(l*n)` while `maximums2` is `O(l*log(n) + n)`

10 Likes

Or you adapt this one to get the maxima instead of minima, which is very efficient for small n.

1 Like

We can make this even faster for multiple calls by using the in-place version `partialsortperm!`

2 Likes

Using a better algorithm is of course the best choice, but if you just wanted to tweak your original code, then simply moving the line

``````if matrix[i, j] > smallest
smallest, index = findmin(vals)  # <-- move it here
arr[index] = matrix[i, j], CartesianIndex(i, j)
vals[index] = matrix[i, j]
end
``````

is already a significant speedup. You need to initialize `smallest` and `index` outside the main loop, btw.

2 Likes
``````using DataStructures

function Nlargest(v,N)
maxn = heapify!(tuple.(v[1:N],1:N))
maxn1=maxn
for i in N+1:length(v)
e=(v[i],i)
if maxn1 < e
heappop!(maxn)
heappush!(maxn,e)
maxn1=maxn
end
end
#sort!(maxn,rev=true)
maxn
end

``````
2 Likes

Thank you everyone, here is my final working code:

``````function maximums2(M, n)
v = vec(M)
l = length(v)
ix = [1:l;]
partialsortperm!(ix, v, (l-n+1):l, initialized=true)
vals = v[ix[(l-n+1):l]]
indices = CartesianIndices(M)[ix[(l-n+1):l]]
return vals, indices
end
``````
1 Like

A micro-optimization: don’t slice `ix` twice, thst creates and allocates two arrays. Just do it once. (Maybe even use a view?)