How do I get the indexes for matrix columns whose elements are all the same?

Suppose you have a matrix m, how do you get the indexes of all the columns whose values are all the same.

For example, if m is such as below, the answer should be 1 because only the first column has all entries equal.

m = [1 2 3; 1 4 5; 1 6 7]

I did find some questions here and at StackOverflow that sort of pointed me in the direction I wanted, but I couldn’t make them work. For instance, in the StackOverflow example of the function allequal_1(x), I couldn’t understand the set up: all(y->y==x[1],x). For instance, why the y?

Thank you.

this is the notation for an anonymous function. It can be read as "given y return the result of y == x[1], and there it is mapped to all elements of x, comparing each of them to x[1].

It would be exactly the same if it was written element_of_x -> element_of_x == x[1] or any other variable name instead of y.

2 Likes

That stackoverflow post is for a much older version, there’s been some convenient methods since then. EDIT: the next answer in fact uses a v1.8 method allequal I didn’t know about that serves the same purpose as points 1-3, though its equality test uses isequal and is different from == here in edge cases.

result = findall(col -> all(el -> el == col[1], col), eachcol(m))
                          1.^----------------^
                      2.^--------------------------^
               3.^---------------------------------^
                                                    4.^--------^
       5.^------------------------------------------------------^
1. anonymous function that captures col[1] from surrounding
   anonymous function (3), compares its input to col[1]
2. <all> checks each element of col and evaluates the function (1)
   if it is false anywhere, it stops and returns false. So <all>
   returns true only if the function at every element returns true
3. anonymous function that takes a column <col> and checks if
   every element is equal to the first element by doing (2)
4. iterable whose elements are columns of input matrix <m>
5. <findall> checks each column in eachcol(m) and returns all
   indices where (3) is true. If your matrix is 1-based like how
   eachcol is 1-based, then these would be the row indices
3 Likes
findall(allequal, eachcol(m))
4 Likes

Thank you so much, @rocco_sprmnt21

Hello, @Benny

Thanks a lot for such a thoughtful and detailed answer. This is super helpful.

Thank you for calling my attention to the use of an anonymous function, @lmiq

For some reason, this less elegant alternative seems to be much more effective:

axes(m,2)[map(allequal, eachcol(m))]  

For example:

m = rand(0:9, 5, 10_000)
@btime findall(allequal, eachcol($m));           # 100 μs (10004 allocs: 782 KiB)
@btime axes($m,2)[map(allequal, eachcol($m))];   #  14 μs (2 allocs: 10 KiB)

findall(allequal, eachcol(m)) == axes(m,2)[map(allequal, eachcol(m))]   # true
3 Likes
findall(i->allequal(@views m[:,i]), axes(m,1))

even faster.

There is a 100x between ‘pretty’ solution and this last one. This feels like something should be corrected somewhere…

EDIT:

What needed correcting, was this statement. Which should have been:

findall(i->allequal(@views m[:,i]), axes(m,2))

and now timing reasonable.

You seem to have made a typo here: axes(m,1), if you correct it to axes(m,2), you get a slowdown.

1 Like

Sorry, yeah… caught it now.

I think the >length(itr) allocations has to do with findall doing a collect on a generator expression with an if statement. When I pulled the method out and omitted the if statement, the allocations went to single digits, though 78.250 KiB. That probably should be optimized one day, the generator does have access to its maximum length from the start.

1 Like
julia> m = rand(0:9, 5, 10_000)
5×10000 Matrix{Int64}:
 6  8  1  9  4  4  9  9  6  8  …  2  5  8  8  9  8  5  5  8  2
 4  1  1  9  6  6  4  4  5  9     2  9  7  2  3  3  9  8  2  0    
 2  3  0  3  6  0  6  2  5  4     9  1  2  0  5  1  6  1  6  5    
 6  0  7  4  9  0  5  0  3  0     4  2  1  6  4  8  9  4  9  9    
 5  7  1  3  3  8  2  4  2  1     6  7  7  6  9  1  5  0  4  1    

julia> @btime axes(m,2)[map(allequal, eachcol(m))]
  15.600 μs (6 allocations: 10.11 KiB)
1-element Vector{Int64}:
 7405

julia> @btime axes(m,2)[[allequal(c) for c in eachcol(m)]]        
  15.400 μs (5 allocations: 10.08 KiB)
1-element Vector{Int64}:
 7405

julia> @btime findall(allequal(c) for c in eachcol(m))  
  90.300 μs (5 allocations: 272 bytes)    # This seems to be the most sparing on resources, though not the fastest
1-element Vector{Int64}:
 7405

julia> @btime findall(allequal, eachcol(m))
  91.300 μs (10005 allocations: 781.56 KiB)
1-element Vector{Int64}:
 7405

julia> @btime findall(i->allequal(@views m[:,i]), axes(m,2))      
  1.841 ms (67968 allocations: 1.34 MiB)
1-element Vector{Int64}:
 7405

a previous experiment with different m (see below) , led me to make the following reflection regarding a scheme used in making comparisons between different algorithms that are claimed to be equivalent.
probably this consideration, which has no burden for practical purposes but is only a logical whim, has already been made by others, but I would like to propose it again here.
The scheme used by @rafael.guerra here (and by all of us elsewhere) sets forth ONE example where two different algorithms give the same result, implying that this holds as equivalence in general.
obviously this is not logically true(the only thing that a single example can logically imply is NOT GENERAL EQUIVALENCE).

julia> m = rand(0:9, 5, 10_000)
5×10000 Matrix{Int64}:
 1  8  0  6  2  7  3  7  7  8  …  7  5  1  7  1  3  4  7  6  7    
 1  9  4  8  5  9  3  3  8  3     4  3  9  5  7  0  6  8  0  2    
 4  7  8  3  6  3  8  1  7  8     8  6  0  9  9  7  7  4  5  6
 4  2  8  9  1  6  4  1  5  8     9  6  0  6  7  8  0  8  1  0    
 1  9  0  8  5  0  3  1  1  3     5  0  8  4  8  0  3  9  5  5    

julia> @btime findall(allequal(c) for c in eachcol(m))
  88.400 μs (3 allocations: 128 bytes)
Int64[]
...
julia> @btime findall(i->allequal(@views m[:,i]), axes(m,1))      
  1.410 μs (27 allocations: 912 bytes)
Int64[]

julia> @btime findall(i->allequal(@views m[:,i]), axes(m,2))      
  1.836 ms (67968 allocations: 1.34 MiB)
Int64[]