I have a Matrix, M. Any row in M contain 1’s and NaN’s. For row 20, for example, I can find the ids of all columns that do not contain NaN:
idfocus = findall(x->~isnan(x),A[20,:])
I am now interested in finding all (if any) rows above 20 (again, just for example) that contain exactly the same id’s as in idfocus.
What is the most efficient way to solve that problem? Is it looping through, from 1 to 19, say, using the code above, and checking if the intersect of ids found in each row and idfocus has the same length? Can that approach not be improved upon?
Based on the context, two main optimizations can be done. First, assuming that NaNs are much fewer than Numbers, then finding NaNs makes more sense. Second, using a loop to prevent allocations can help esepcially if we break early at the first difference.
Here is a contrived MWE with size 200:
indices = Int
idfocus = findall(isnan,a[N,:])
lastidx = last(idfocus)
for i in 1:N-1
colindx = 1
for j in idfocus
isnan(a[i,j]) || break
colindx = i
colindx == lastidx && push!(indices, i)
a = rand([0:99;fill(NaN,10)], 200, 200)
@btime testnan($a, 200)
1.070 μs (7 allocations: 2.20 KiB)