Performance tips for a function which finds n maximums in a matrix

Hi everyone,

I have been trying to squeeze out as much performance as possible out from the following function, which finds the n largest (or maximum) values of a matrix and returns those values and their positions in the matrix, however I do have to note that in the specific case in which I am calling the function I really don’t need the values, just the indices:

function maximums(matrix, n)::Tuple{Vector{Int64}, Vector{CartesianIndex{2}}}
    type = eltype(matrix)
    vals = zeros(type, n)
    indices = Array{Int64}(undef,n)
    arr = Array{Tuple{type,CartesianIndex}}(undef, n)
    @inbounds for i ∈ axes(matrix, 1), j ∈ axes(matrix, 2)
        smallest, index = findmin(vals)
        if matrix[i, j] > smallest
            arr[index] = matrix[i, j], CartesianIndex(i, j)
            vals[index] = matrix[i, j]
        end
    end
    arr = sort(arr, by=x -> x[1], rev=true)
    vals = [x[1] for x in arr]
    indices = [x[2] for x in arr]
    return vals, indices
end

I have already profiled it using @code_warntype and pprof and I can’t seem to find any possible room for improvement. What can I do to make it run faster? Or does my logic need a rewrite from the ground up?

I call this function thousands of times in my main function so I would really appreciate any speed up.

Thank you very much!

1 Like

You can make this thousands of times faster (look at partialsort and partialsortperm)

4 Likes

Specifically:

function maximums2(M, n)
    v = vec(M)
    l = length(v)
    perm = partialsortperm(v, (l-n+1):l)
    vals = v[perm]
    indices = CartesianIndices(M)[perm]
    return vals, indices
end

M = rand(Int, 1000,1000);

@time maximums(M, 10000);
  8.694805 seconds (421.12 k allocations: 7.938 MiB)

@time maximums2(M, 10000);
  0.010756 seconds (13 allocations: 15.488 MiB)

Specifically if l is the number of elements in the matrix, maximums is O(l*n) while maximums2 is O(l*log(n) + n)

11 Likes

Or you adapt this one to get the maxima instead of minima, which is very efficient for small n.

1 Like

We can make this even faster for multiple calls by using the in-place version partialsortperm!

2 Likes

Using a better algorithm is of course the best choice, but if you just wanted to tweak your original code, then simply moving the line

if matrix[i, j] > smallest
    smallest, index = findmin(vals)  # <-- move it here
    arr[index] = matrix[i, j], CartesianIndex(i, j)
    vals[index] = matrix[i, j]
end

is already a significant speedup. You need to initialize smallest and index outside the main loop, btw.

2 Likes
using DataStructures

function Nlargest(v,N)
    maxn = heapify!(tuple.(v[1:N],1:N))
    maxn1=maxn[1]
    for i in N+1:length(v)
        e=(v[i],i)    
        if maxn1[1] < e[1]
            heappop!(maxn)
            heappush!(maxn,e)
            maxn1=maxn[1]
        end
    end
    #sort!(maxn,rev=true)
    maxn
end
  
 
2 Likes

Thank you everyone, here is my final working code:

function maximums2(M, n)
    v = vec(M)
    l = length(v)
    ix = [1:l;]
    partialsortperm!(ix, v, (l-n+1):l, initialized=true)
    vals = v[ix[(l-n+1):l]]
    indices = CartesianIndices(M)[ix[(l-n+1):l]]
    return vals, indices
end
1 Like

A micro-optimization: don’t slice ix twice, thst creates and allocates two arrays. Just do it once. (Maybe even use a view?)