Performance tips for a function which finds n maximums in a matrix

Hi everyone,

I have been trying to squeeze out as much performance as possible out from the following function, which finds the n largest (or maximum) values of a matrix and returns those values and their positions in the matrix, however I do have to note that in the specific case in which I am calling the function I really don’t need the values, just the indices:

function maximums(matrix, n)::Tuple{Vector{Int64}, Vector{CartesianIndex{2}}}
    type = eltype(matrix)
    vals = zeros(type, n)
    indices = Array{Int64}(undef,n)
    arr = Array{Tuple{type,CartesianIndex}}(undef, n)
    @inbounds for i ∈ axes(matrix, 1), j ∈ axes(matrix, 2)
        smallest, index = findmin(vals)
        if matrix[i, j] > smallest
            arr[index] = matrix[i, j], CartesianIndex(i, j)
            vals[index] = matrix[i, j]
        end
    end
    arr = sort(arr, by=x -> x[1], rev=true)
    vals = [x[1] for x in arr]
    indices = [x[2] for x in arr]
    return vals, indices
end

I have already profiled it using @code_warntype and pprof and I can’t seem to find any possible room for improvement. What can I do to make it run faster? Or does my logic need a rewrite from the ground up?

I call this function thousands of times in my main function so I would really appreciate any speed up.

Thank you very much!

You can make this thousands of times faster (look at partialsort and partialsortperm)

Specifically:

function maximums2(M, n)
    v = vec(M)
    l = length(v)
    perm = partialsortperm(v, (l-n+1):l)
    vals = v[perm]
    indices = CartesianIndices(M)[perm]
    return vals, indices
end

M = rand(Int, 1000,1000);

@time maximums(M, 10000);
  8.694805 seconds (421.12 k allocations: 7.938 MiB)

@time maximums2(M, 10000);
  0.010756 seconds (13 allocations: 15.488 MiB)

Specifically if l is the number of elements in the matrix, maximums is O(l*n) while maximums2 is O(l*log(n) + n)

Or you adapt this one to get the maxima instead of minima, which is very efficient for small n.

We can make this even faster for multiple calls by using the in-place version partialsortperm!

Using a better algorithm is of course the best choice, but if you just wanted to tweak your original code, then simply moving the line

if matrix[i, j] > smallest
    smallest, index = findmin(vals)  # <-- move it here
    arr[index] = matrix[i, j], CartesianIndex(i, j)
    vals[index] = matrix[i, j]
end

is already a significant speedup. You need to initialize smallest and index outside the main loop, btw.

using DataStructures

function Nlargest(v,N)
    maxn = heapify!(tuple.(v[1:N],1:N))
    maxn1=maxn[1]
    for i in N+1:length(v)
        e=(v[i],i)    
        if maxn1[1] < e[1]
            heappop!(maxn)
            heappush!(maxn,e)
            maxn1=maxn[1]
        end
    end
    #sort!(maxn,rev=true)
    maxn
end
  
 

Thank you everyone, here is my final working code:

function maximums2(M, n)
    v = vec(M)
    l = length(v)
    ix = [1:l;]
    partialsortperm!(ix, v, (l-n+1):l, initialized=true)
    vals = v[ix[(l-n+1):l]]
    indices = CartesianIndices(M)[ix[(l-n+1):l]]
    return vals, indices
end

A micro-optimization: don’t slice ix twice, thst creates and allocates two arrays. Just do it once. (Maybe even use a view?)