Finding all rows that are the same as a "focus" row

I have a Matrix, M. Any row in M contain 1’s and NaN’s. For row 20, for example, I can find the ids of all columns that do not contain NaN:

idfocus = findall(x->~isnan(x),A[20,:])

I am now interested in finding all (if any) rows above 20 (again, just for example) that contain exactly the same id’s as in idfocus.

What is the most efficient way to solve that problem? Is it looping through, from 1 to 19, say, using the code above, and checking if the intersect of ids found in each row and idfocus has the same length? Can that approach not be improved upon?

Check if the elements of indices idfocus of the other columns are also NaNs, without storing any new array. If they are, run over the other elements of that row to see if any other element is a NaN.

Store only the row indexes that match.

Perhaps provide an MWE that generates typical data, so that people can benchmark.

1 Like

Based on the context, two main optimizations can be done. First, assuming that NaNs are much fewer than Numbers, then finding NaNs makes more sense. Second, using a loop to prevent allocations can help esepcially if we break early at the first difference.

Here is a contrived MWE with size 200:

function testnan(a,N) 
    indices = Int[]
    idfocus = findall(isnan,a[N,:])
    lastidx = last(idfocus)
    for i in 1:N-1
        colindx = 1
        for j in idfocus
            isnan(a[i,j]) || break
            colindx = i
        end
        colindx == lastidx && push!(indices, i)
    end
    indices
end 

a = rand([0:99;fill(NaN,10)], 200, 200)

using BenchmarkTools
@btime testnan($a, 200) 
  1.070 μs (7 allocations: 2.20 KiB)
2 Likes