Find n smallest values in an n dims array

Hi Every one!
I have the bad tendency of working with N dims arrays.
Right now I have a 10x20x6 arrays and I need to find the index of the 5 smallest values. I have tried PartialQuickSort and argmin but I can’t get them to work in an N dims array.

Any idea?

Thanks in advance!

If you use vec on the ND arrays, you will get a 1D array to which you can apply the other methods. This should not involve copying.

https://docs.julialang.org/en/v1/base/arrays/#Base.vec

2 Likes

But it will give me the coordinates in the vector 1D form, right?

M = rand(10,10,20)
argmin(M)
1 Like

THis gives me the minimum, but not the 5 smallest ones.

If you want to transform from the linear index to the cartesian index, get the linear indicies and then use them to index into CartesianIndices.

2 Likes

I think this is a reasonable alternative:

function _up_nmin!(a, n, min_vals, min_idxs)
    @inbounds for (idx, val) in pairs(a)
        i = searchsortedfirst(min_vals, val)
        if i <= n
            for j = n:-1:i+1
                min_vals[j] = min_vals[j-1]
                min_idxs[j] = min_idxs[j-1]
            end
            min_vals[i] = val
            min_idxs[i] = idx
        end
    end
    return nothing
end

function nmin(a::AbstractArray{T,N}, n::Integer) where {T,N}
    min_vals = [ typemax(T) for _ in 1:n ]
    min_idxs = Vector{CartesianIndex{N}}(undef, n)
    _up_nmin!(a, n, min_vals, min_idxs)
    return min_idxs, min_vals
end

It is faster than sorting the array, at least:

julia> x = rand(10,20,6);

julia> @btime nmin($x, 5);
  7.758 μs (4 allocations: 336 bytes)

julia> @btime sort!(vec(y)) setup=(y=copy(x)) evals=1;
  19.193 μs (2 allocations: 80 bytes)

edit: some code simplifications

1 Like

To demonstrate the index mapping (ignoring allocations, performance):

julia> M = rand(10,10,20);

julia> indexmins(M, n) = 
       CartesianIndices(M)[
                 sort(findall(x->(x <= n), 
                      sortperm(vec(M))))]
indexmins (generic function with 1 method)

julia> indexmins(M, 5)
5-element Vector{CartesianIndex{3}}:
 CartesianIndex(10, 8, 1)
 CartesianIndex(7, 9, 12)
 CartesianIndex(3, 10, 15)
 CartesianIndex(1, 4, 16)
 CartesianIndex(2, 10, 20)

julia> M[indexmins(M,5)]
5-element Vector{Float64}:
 0.8925279351802109
 0.2971133330497415
 0.7161841701653623
 0.35505256038732036
 0.27325166482366414
1 Like

@JeffreySarnoff that seems to do something different than how I interpreted the question.

julia> A = rand(0:128, 4, 3, 2)
4×3×2 Array{Int64, 3}:
[:, :, 1] =
 33   83    7
 41  128  107
 86    3   17
 64  115   25

[:, :, 2] =
 61   46  81
 23   75  92
 58   58  40
 24  103  20

julia> indexmins(M, n) =
              CartesianIndices(M)[
                        sort(findall(x->(x <= n),
                             sortperm(vec(M))))]
indexmins (generic function with 1 method)

julia> ind = indexmins(A, 5)
5-element Vector{CartesianIndex{3}}:
 CartesianIndex(4, 2, 1)
 CartesianIndex(2, 3, 1)
 CartesianIndex(3, 1, 2)
 CartesianIndex(2, 2, 2)
 CartesianIndex(3, 2, 2)

julia> A[ind]
5-element Vector{Int64}:
 115
 107
  58
  75
  58

I thought he wanted the indices of the 5 smallest values in the original array:

julia> function arg_n_smallest_values(A::AbstractArray{T,N}, n::Integer) where {T,N}
           perm = sortperm(vec(A))
           ci = CartesianIndices(A)
           return ci[perm[1:n]]
       end
arg_n_smallest_values (generic function with 2 methods)

julia> ind = arg_n_smallest_values(A, 5)
5-element Vector{CartesianIndex{3}}:
 CartesianIndex(3, 2, 1)
 CartesianIndex(1, 3, 1)
 CartesianIndex(3, 3, 1)
 CartesianIndex(4, 3, 2)
 CartesianIndex(2, 1, 2)

julia> A[ind]
5-element Vector{Int64}:
  3
  7
 17
 20
 23

3 Likes

either way, the question is covered now :slight_smile:

1 Like

Thanks guys. This should be included somewhere in Julia

Perhaps we should have a clearer tutorial on how to do this in the manual, but including this small function in Base Julia is debatable.

While I would lean towards including functions like this in Base, the general philosophy is to be minimalist. It is much harder to add and maintain a function like this inside Base Julia than it is to provide this in a package due to the release cycles. It would be much easier to provide a package with utilities like this that could be updated frequently.

Either way, if you would like to pursue a pull request to Julia or create a package containing this function I would be happy to help.

I think this version is not optimal if n is very small compared to the size of the array since you make a full sort and just retain a few smallest values. At the limit, imagine to get the lowest value of an array you make a sort first. That does not sound right. A dirty write-up like the following

using BenchmarkTools
A=rand(100,100,10,40,10);
function arg_n_smallest_values(A::AbstractArray{T,N}, n::Integer) where {T,N}
           perm = sortperm(vec(A))
           ci = CartesianIndices(A)
           return ci[perm[1:n]]
       end
function argsmallest(A::AbstractArray{T,N}, n::Integer) where {T,N}
           # should someone ask more elements than array size, just sort array
       
           if n>= length(vec(A))
             ind=collect(1:length(vec(A)))
             ind=sortperm(A[ind])
             return CartesianIndices(A)[ind]
           end
           # otherwise 
           ind=collect(1:n)
           mymax=maximum(A[ind])
           for j=n+1:length(vec(A))
           if A[j]<mymax
            getout=findmax(A[ind])[2]
            ind[getout]=j
            mymax=maximum(A[ind])
           end
           end
           ind=ind[sortperm(A[ind])]
           
           return CartesianIndices(A)[ind]
       end

@btime arg_n_smallest_values(A,5)  # 16.61 s
@btime argsmallest(A,5)  # 42 ms

already greatly improves the timing, but I’m sure there are even better ways ?

6 Likes

@JM_Beckers, your code is blazzing fast.

Could you tell why it is an order of magnitude faster than the simplest call:

nargmin(M, n) = CartesianIndices(M)[partialsortperm(view(M,:), 1:n)]

Thank you.

1 Like

Cool, that is twice as fast as this:

function _up_nmin!(a, n, min_vals, min_idxs)
    @inbounds for (idx, val) in pairs(a)
        if val < min_vals[n]
            i = searchsortedfirst(min_vals, val)
            for j = n:-1:i+1
                min_vals[j] = min_vals[j-1]
                min_idxs[j] = min_idxs[j-1]
            end
            min_vals[i] = val
            min_idxs[i] = idx
        end
    end
    return nothing
end

function nmin(a::AbstractArray{T,N}, n::Integer) where {T,N}
    min_vals = [ typemax(T) for _ in 1:n ]
    min_idxs = Vector{CartesianIndex{N}}(undef, n)
    _up_nmin!(a, n, min_vals, min_idxs)
    return min_idxs, min_vals
end

I guess because here I’m tracking the values as well and keeping them in order?

julia> @btime nmin($A,5)
  85.279 ms (2 allocations: 352 bytes)
(CartesianIndex{5}[CartesianIndex(96, 7, 6, 18, 1), CartesianIndex(45, 31, 8, 38, 8), CartesianIndex(84, 23, 3, 12, 8), CartesianIndex(89, 25, 7, 39, 1), CartesianIndex(1, 4, 5, 10, 6)], [5.279682058212387e-8, 5.647766321281722e-8, 1.2348968225772694e-7, 1.2398590831796952e-7, 1.354943585107904e-7])

julia> @btime argsmallest($A,5)
  41.264 ms (161 allocations: 14.95 KiB)
5-element Vector{CartesianIndex{5}}:
 CartesianIndex(96, 7, 6, 18, 1)
 CartesianIndex(45, 31, 8, 38, 8)
 CartesianIndex(84, 23, 3, 12, 8)
 CartesianIndex(89, 25, 7, 39, 1)
 CartesianIndex(1, 4, 5, 10, 6)

2 Likes

I don’t know how partialsortperm works internally, but I think my version is probably fast when n is very small because then there is only a single (small) sort at the end, while the rest of the algorithm cost is at most n*N where N is the array size; and it just needs to find the maximum in a small array whenever a new candidate for inclusion is encountered.

2 Likes

Thank you.
For your M input the cross-over seems to be around n = 2_500.

I’m surprised that it is still fast for relatively large values of n, like 2_000 in the 40_000_000 element case.
But what is even more surprising is the comparison of

@btime findmin($A) # 324.853 ms
@btime argmin($A) # 335.082 ms (0 allocations: 0 bytes)
@btime argsmallest($A,1) # 52.824 ms (37 allocations: 2.20 KiB)

???

2 Likes

You may check the source code for findmin and argmin and possibly submit a pull request. However, I discovered that many Julia algorithms are optimized for huge inputs; for instance, at around 10^8, they are almost unbeatable.

Actually for 10^8 elements the gain is even larger (9 times faster, even retaining all the code pieces to deal with n values). I’m not an expert in generic approaches and fear that my little code is too specific for arrays while the base of Julia is kept generic? But hey, if you know you work with arrays, doing it with a straightforward for loop as in my code could save you quite some time …