Find n smallest values in an n dims array

marianoarnaiz · May 15, 2022, 12:07pm

Hi Every one!
I have the bad tendency of working with N dims arrays.
Right now I have a 10x20x6 arrays and I need to find the index of the 5 smallest values. I have tried PartialQuickSort and argmin but I can’t get them to work in an N dims array.

Any idea?

Thanks in advance!

mkitti · May 15, 2022, 2:58pm

If you use vec on the ND arrays, you will get a 1D array to which you can apply the other methods. This should not involve copying.

https://docs.julialang.org/en/v1/base/arrays/#Base.vec

marianoarnaiz · May 15, 2022, 4:19pm

But it will give me the coordinates in the vector 1D form, right?

jling · May 15, 2022, 4:34pm

M = rand(10,10,20)
argmin(M)

marianoarnaiz · May 15, 2022, 5:26pm

THis gives me the minimum, but not the 5 smallest ones.

mkitti · May 15, 2022, 5:37pm

If you want to transform from the linear index to the cartesian index, get the linear indicies and then use them to index into CartesianIndices.

lmiq · May 15, 2022, 5:38pm

I think this is a reasonable alternative:

function _up_nmin!(a, n, min_vals, min_idxs)
    @inbounds for (idx, val) in pairs(a)
        i = searchsortedfirst(min_vals, val)
        if i <= n
            for j = n:-1:i+1
                min_vals[j] = min_vals[j-1]
                min_idxs[j] = min_idxs[j-1]
            end
            min_vals[i] = val
            min_idxs[i] = idx
        end
    end
    return nothing
end

function nmin(a::AbstractArray{T,N}, n::Integer) where {T,N}
    min_vals = [ typemax(T) for _ in 1:n ]
    min_idxs = Vector{CartesianIndex{N}}(undef, n)
    _up_nmin!(a, n, min_vals, min_idxs)
    return min_idxs, min_vals
end

It is faster than sorting the array, at least:

julia> x = rand(10,20,6);

julia> @btime nmin($x, 5);
  7.758 μs (4 allocations: 336 bytes)

julia> @btime sort!(vec(y)) setup=(y=copy(x)) evals=1;
  19.193 μs (2 allocations: 80 bytes)

edit: some code simplifications

JeffreySarnoff · May 15, 2022, 5:46pm

To demonstrate the index mapping (ignoring allocations, performance):

julia> M = rand(10,10,20);

julia> indexmins(M, n) = 
       CartesianIndices(M)[
                 sort(findall(x->(x <= n), 
                      sortperm(vec(M))))]
indexmins (generic function with 1 method)

julia> indexmins(M, 5)
5-element Vector{CartesianIndex{3}}:
 CartesianIndex(10, 8, 1)
 CartesianIndex(7, 9, 12)
 CartesianIndex(3, 10, 15)
 CartesianIndex(1, 4, 16)
 CartesianIndex(2, 10, 20)

julia> M[indexmins(M,5)]
5-element Vector{Float64}:
 0.8925279351802109
 0.2971133330497415
 0.7161841701653623
 0.35505256038732036
 0.27325166482366414

mkitti · May 15, 2022, 5:57pm

@JeffreySarnoff that seems to do something different than how I interpreted the question.

julia> A = rand(0:128, 4, 3, 2)
4×3×2 Array{Int64, 3}:
[:, :, 1] =
 33   83    7
 41  128  107
 86    3   17
 64  115   25

[:, :, 2] =
 61   46  81
 23   75  92
 58   58  40
 24  103  20

julia> indexmins(M, n) =
              CartesianIndices(M)[
                        sort(findall(x->(x <= n),
                             sortperm(vec(M))))]
indexmins (generic function with 1 method)

julia> ind = indexmins(A, 5)
5-element Vector{CartesianIndex{3}}:
 CartesianIndex(4, 2, 1)
 CartesianIndex(2, 3, 1)
 CartesianIndex(3, 1, 2)
 CartesianIndex(2, 2, 2)
 CartesianIndex(3, 2, 2)

julia> A[ind]
5-element Vector{Int64}:
 115
 107
  58
  75
  58

I thought he wanted the indices of the 5 smallest values in the original array:

julia> function arg_n_smallest_values(A::AbstractArray{T,N}, n::Integer) where {T,N}
           perm = sortperm(vec(A))
           ci = CartesianIndices(A)
           return ci[perm[1:n]]
       end
arg_n_smallest_values (generic function with 2 methods)

julia> ind = arg_n_smallest_values(A, 5)
5-element Vector{CartesianIndex{3}}:
 CartesianIndex(3, 2, 1)
 CartesianIndex(1, 3, 1)
 CartesianIndex(3, 3, 1)
 CartesianIndex(4, 3, 2)
 CartesianIndex(2, 1, 2)

julia> A[ind]
5-element Vector{Int64}:
  3
  7
 17
 20
 23

JeffreySarnoff · May 15, 2022, 6:53pm

either way, the question is covered now

marianoarnaiz · May 15, 2022, 7:11pm

mkitti:

function arg_n_smallest_values(A::AbstractArray{T,N}, n::Integer) where {T,N}
           perm = sortperm(vec(A))
           ci = CartesianIndices(A)
           return ci[perm[1:n]]
       end

Thanks guys. This should be included somewhere in Julia

mkitti · May 15, 2022, 7:26pm

Perhaps we should have a clearer tutorial on how to do this in the manual, but including this small function in Base Julia is debatable.

While I would lean towards including functions like this in Base, the general philosophy is to be minimalist. It is much harder to add and maintain a function like this inside Base Julia than it is to provide this in a package due to the release cycles. It would be much easier to provide a package with utilities like this that could be updated frequently.

Either way, if you would like to pursue a pull request to Julia or create a package containing this function I would be happy to help.

JM_Beckers · May 15, 2022, 8:03pm

I think this version is not optimal if n is very small compared to the size of the array since you make a full sort and just retain a few smallest values. At the limit, imagine to get the lowest value of an array you make a sort first. That does not sound right. A dirty write-up like the following

using BenchmarkTools
A=rand(100,100,10,40,10);
function arg_n_smallest_values(A::AbstractArray{T,N}, n::Integer) where {T,N}
           perm = sortperm(vec(A))
           ci = CartesianIndices(A)
           return ci[perm[1:n]]
       end
function argsmallest(A::AbstractArray{T,N}, n::Integer) where {T,N}
           # should someone ask more elements than array size, just sort array
       
           if n>= length(vec(A))
             ind=collect(1:length(vec(A)))
             ind=sortperm(A[ind])
             return CartesianIndices(A)[ind]
           end
           # otherwise 
           ind=collect(1:n)
           mymax=maximum(A[ind])
           for j=n+1:length(vec(A))
           if A[j]<mymax
            getout=findmax(A[ind])[2]
            ind[getout]=j
            mymax=maximum(A[ind])
           end
           end
           ind=ind[sortperm(A[ind])]
           
           return CartesianIndices(A)[ind]
       end

@btime arg_n_smallest_values(A,5)  # 16.61 s
@btime argsmallest(A,5)  # 42 ms

already greatly improves the timing, but I’m sure there are even better ways ?

rafael.guerra · May 15, 2022, 8:28pm

@JM_Beckers, your code is blazzing fast.

Could you tell why it is an order of magnitude faster than the simplest call:

nargmin(M, n) = CartesianIndices(M)[partialsortperm(view(M,:), 1:n)]

Thank you.

lmiq · May 15, 2022, 8:51pm

Cool, that is twice as fast as this:

function _up_nmin!(a, n, min_vals, min_idxs)
    @inbounds for (idx, val) in pairs(a)
        if val < min_vals[n]
            i = searchsortedfirst(min_vals, val)
            for j = n:-1:i+1
                min_vals[j] = min_vals[j-1]
                min_idxs[j] = min_idxs[j-1]
            end
            min_vals[i] = val
            min_idxs[i] = idx
        end
    end
    return nothing
end

function nmin(a::AbstractArray{T,N}, n::Integer) where {T,N}
    min_vals = [ typemax(T) for _ in 1:n ]
    min_idxs = Vector{CartesianIndex{N}}(undef, n)
    _up_nmin!(a, n, min_vals, min_idxs)
    return min_idxs, min_vals
end

I guess because here I’m tracking the values as well and keeping them in order?

julia> @btime nmin($A,5)
  85.279 ms (2 allocations: 352 bytes)
(CartesianIndex{5}[CartesianIndex(96, 7, 6, 18, 1), CartesianIndex(45, 31, 8, 38, 8), CartesianIndex(84, 23, 3, 12, 8), CartesianIndex(89, 25, 7, 39, 1), CartesianIndex(1, 4, 5, 10, 6)], [5.279682058212387e-8, 5.647766321281722e-8, 1.2348968225772694e-7, 1.2398590831796952e-7, 1.354943585107904e-7])

julia> @btime argsmallest($A,5)
  41.264 ms (161 allocations: 14.95 KiB)
5-element Vector{CartesianIndex{5}}:
 CartesianIndex(96, 7, 6, 18, 1)
 CartesianIndex(45, 31, 8, 38, 8)
 CartesianIndex(84, 23, 3, 12, 8)
 CartesianIndex(89, 25, 7, 39, 1)
 CartesianIndex(1, 4, 5, 10, 6)

JM_Beckers · May 15, 2022, 8:54pm

I don’t know how partialsortperm works internally, but I think my version is probably fast when n is very small because then there is only a single (small) sort at the end, while the rest of the algorithm cost is at most n*N where N is the array size; and it just needs to find the maximum in a small array whenever a new candidate for inclusion is encountered.

rafael.guerra · May 15, 2022, 9:04pm

Thank you.
For your M input the cross-over seems to be around n = 2_500.

JM_Beckers · May 15, 2022, 9:59pm

I’m surprised that it is still fast for relatively large values of n, like 2_000 in the 40_000_000 element case.
But what is even more surprising is the comparison of

@btime findmin($A) # 324.853 ms
@btime argmin($A) # 335.082 ms (0 allocations: 0 bytes)
@btime argsmallest($A,1) # 52.824 ms (37 allocations: 2.20 KiB)

???

Seif_Shebl · May 16, 2022, 11:34am

You may check the source code for findmin and argmin and possibly submit a pull request. However, I discovered that many Julia algorithms are optimized for huge inputs; for instance, at around 10^8, they are almost unbeatable.

JM_Beckers · May 16, 2022, 3:43pm

Actually for 10^8 elements the gain is even larger (9 times faster, even retaining all the code pieces to deal with n values). I’m not an expert in generic approaches and fear that my little code is too specific for arrays while the base of Julia is kept generic? But hey, if you know you work with arrays, doing it with a straightforward for loop as in my code could save you quite some time …

Topic		Replies	Views
Find n smallest values in an array New to Julia question	5	2945	June 27, 2020
Performance tips for a function which finds n maximums in a matrix Performance performance , sortperm , speed-optimization , matrix	8	679	March 23, 2023
Feature request: generalization of findmax/findmin to n values Internals & Design	5	2608	September 17, 2022
Sort huge array get smallest 1000 Performance	5	831	December 31, 2017
Argmax, but n biggest elements General Usage arrays	3	1766	January 13, 2022

Find n smallest values in an n dims array

Related topics