I have a Matrix, M. Any row in M contain 1’s and NaN’s. For row 20, for example, I can find the ids of all columns that do not contain NaN:
idfocus = findall(x->~isnan(x),A[20,:])
I am now interested in finding all (if any) rows above 20 (again, just for example) that contain exactly the same id’s as in idfocus.
What is the most efficient way to solve that problem? Is it looping through, from 1 to 19, say, using the code above, and checking if the intersect of ids found in each row and idfocus has the same length? Can that approach not be improved upon?
Check if the elements of indices idfocus of the other columns are also NaNs, without storing any new array. If they are, run over the other elements of that row to see if any other element is a NaN.
Based on the context, two main optimizations can be done. First, assuming that NaNs are much fewer than Numbers, then finding NaNs makes more sense. Second, using a loop to prevent allocations can help esepcially if we break early at the first difference.
Here is a contrived MWE with size 200:
function testnan(a,N)
indices = Int[]
idfocus = findall(isnan,a[N,:])
lastidx = last(idfocus)
for i in 1:N-1
colindx = 1
for j in idfocus
isnan(a[i,j]) || break
colindx = i
end
colindx == lastidx && push!(indices, i)
end
indices
end
a = rand([0:99;fill(NaN,10)], 200, 200)
using BenchmarkTools
@btime testnan($a, 200)
1.070 μs (7 allocations: 2.20 KiB)