Finding all rows that are the same as a "focus" row

Nash · May 8, 2021, 7:10pm

I have a Matrix, M. Any row in M contain 1’s and NaN’s. For row 20, for example, I can find the ids of all columns that do not contain NaN:

idfocus = findall(x->~isnan(x),A[20,:])

I am now interested in finding all (if any) rows above 20 (again, just for example) that contain exactly the same id’s as in idfocus.

What is the most efficient way to solve that problem? Is it looping through, from 1 to 19, say, using the code above, and checking if the intersect of ids found in each row and idfocus has the same length? Can that approach not be improved upon?

lmiq · May 9, 2021, 12:22am

Check if the elements of indices idfocus of the other columns are also NaNs, without storing any new array. If they are, run over the other elements of that row to see if any other element is a NaN.

Store only the row indexes that match.

Tamas_Papp · May 9, 2021, 8:04am

Perhaps provide an MWE that generates typical data, so that people can benchmark.

Seif_Shebl · May 9, 2021, 10:43pm

Based on the context, two main optimizations can be done. First, assuming that NaNs are much fewer than Numbers, then finding NaNs makes more sense. Second, using a loop to prevent allocations can help esepcially if we break early at the first difference.

Here is a contrived MWE with size 200:

function testnan(a,N) 
    indices = Int[]
    idfocus = findall(isnan,a[N,:])
    lastidx = last(idfocus)
    for i in 1:N-1
        colindx = 1
        for j in idfocus
            isnan(a[i,j]) || break
            colindx = i
        end
        colindx == lastidx && push!(indices, i)
    end
    indices
end 

a = rand([0:99;fill(NaN,10)], 200, 200)

using BenchmarkTools
@btime testnan($a, 200) 
  1.070 μs (7 allocations: 2.20 KiB)

Topic		Replies	Views
How to efficiently find columns of the matrix which are the same? New to Julia question , optimization	10	1173	November 7, 2023
Fastest way possible to find index of value equals 1 across a matrix column Performance question , performance , speed-optimization	7	1018	October 6, 2023
How do I get the indexes for matrix columns whose elements are all the same? General Usage	12	603	April 13, 2023
Finding values and deleting rows in a DataFrame General Usage	2	263	December 31, 2020
How to get the index of the rows with non-NaN values for multiple vectors? General Usage question , indexing , arrays	2	1686	December 19, 2021

Finding all rows that are the same as a "focus" row

Related topics