How do I get the indexes for matrix columns whose elements are all the same?

Hugo · April 12, 2023, 6:55pm

Suppose you have a matrix m, how do you get the indexes of all the columns whose values are all the same.

For example, if m is such as below, the answer should be 1 because only the first column has all entries equal.

m = [1 2 3; 1 4 5; 1 6 7]

I did find some questions here and at StackOverflow that sort of pointed me in the direction I wanted, but I couldn’t make them work. For instance, in the StackOverflow example of the function allequal_1(x), I couldn’t understand the set up: all(y->y==x[1],x). For instance, why the y?

Thank you.

lmiq · April 12, 2023, 7:18pm

this is the notation for an anonymous function. It can be read as "given y return the result of y == x[1], and there it is mapped to all elements of x, comparing each of them to x[1].

It would be exactly the same if it was written element_of_x -> element_of_x == x[1] or any other variable name instead of y.

Benny · April 12, 2023, 7:31pm

That stackoverflow post is for a much older version, there’s been some convenient methods since then. EDIT: the next answer in fact uses a v1.8 method allequal I didn’t know about that serves the same purpose as points 1-3, though its equality test uses isequal and is different from == here in edge cases.

result = findall(col -> all(el -> el == col[1], col), eachcol(m))
                          1.^----------------^
                      2.^--------------------------^
               3.^---------------------------------^
                                                    4.^--------^
       5.^------------------------------------------------------^
1. anonymous function that captures col[1] from surrounding
   anonymous function (3), compares its input to col[1]
2. <all> checks each element of col and evaluates the function (1)
   if it is false anywhere, it stops and returns false. So <all>
   returns true only if the function at every element returns true
3. anonymous function that takes a column <col> and checks if
   every element is equal to the first element by doing (2)
4. iterable whose elements are columns of input matrix <m>
5. <findall> checks each column in eachcol(m) and returns all
   indices where (3) is true. If your matrix is 1-based like how
   eachcol is 1-based, then these would be the row indices

rocco_sprmnt21 · April 12, 2023, 7:46pm

findall(allequal, eachcol(m))

Hugo · April 12, 2023, 8:56pm

Thank you so much, @rocco_sprmnt21

Hugo · April 12, 2023, 8:57pm

Hello, @Benny

Thanks a lot for such a thoughtful and detailed answer. This is super helpful.

Hugo · April 12, 2023, 8:57pm

Thank you for calling my attention to the use of an anonymous function, @lmiq

rafael.guerra · April 12, 2023, 9:54pm

For some reason, this less elegant alternative seems to be much more effective:

axes(m,2)[map(allequal, eachcol(m))]

For example:

m = rand(0:9, 5, 10_000)
@btime findall(allequal, eachcol($m));           # 100 μs (10004 allocs: 782 KiB)
@btime axes($m,2)[map(allequal, eachcol($m))];   #  14 μs (2 allocs: 10 KiB)

findall(allequal, eachcol(m)) == axes(m,2)[map(allequal, eachcol(m))]   # true

Dan · April 12, 2023, 10:55pm

findall(i->allequal(@views m[:,i]), axes(m,1))

even faster.

There is a 100x between ‘pretty’ solution and this last one. This feels like something should be corrected somewhere…

EDIT:

What needed correcting, was this statement. Which should have been:

findall(i->allequal(@views m[:,i]), axes(m,2))

and now timing reasonable.

rafael.guerra · April 12, 2023, 10:58pm

You seem to have made a typo here: axes(m,1), if you correct it to axes(m,2), you get a slowdown.

Dan · April 12, 2023, 11:00pm

Sorry, yeah… caught it now.

Benny · April 13, 2023, 7:44am

I think the >length(itr) allocations has to do with findall doing a collect on a generator expression with an if statement. When I pulled the method out and omitted the if statement, the allocations went to single digits, though 78.250 KiB. That probably should be optimized one day, the generator does have access to its maximum length from the start.

rocco_sprmnt21 · April 13, 2023, 8:41am

julia> m = rand(0:9, 5, 10_000)
5×10000 Matrix{Int64}:
 6  8  1  9  4  4  9  9  6  8  …  2  5  8  8  9  8  5  5  8  2
 4  1  1  9  6  6  4  4  5  9     2  9  7  2  3  3  9  8  2  0    
 2  3  0  3  6  0  6  2  5  4     9  1  2  0  5  1  6  1  6  5    
 6  0  7  4  9  0  5  0  3  0     4  2  1  6  4  8  9  4  9  9    
 5  7  1  3  3  8  2  4  2  1     6  7  7  6  9  1  5  0  4  1    

julia> @btime axes(m,2)[map(allequal, eachcol(m))]
  15.600 μs (6 allocations: 10.11 KiB)
1-element Vector{Int64}:
 7405

julia> @btime axes(m,2)[[allequal(c) for c in eachcol(m)]]        
  15.400 μs (5 allocations: 10.08 KiB)
1-element Vector{Int64}:
 7405

julia> @btime findall(allequal(c) for c in eachcol(m))  
  90.300 μs (5 allocations: 272 bytes)    # This seems to be the most sparing on resources, though not the fastest
1-element Vector{Int64}:
 7405

julia> @btime findall(allequal, eachcol(m))
  91.300 μs (10005 allocations: 781.56 KiB)
1-element Vector{Int64}:
 7405

julia> @btime findall(i->allequal(@views m[:,i]), axes(m,2))      
  1.841 ms (67968 allocations: 1.34 MiB)
1-element Vector{Int64}:
 7405

a previous experiment with different m (see below) , led me to make the following reflection regarding a scheme used in making comparisons between different algorithms that are claimed to be equivalent.
probably this consideration, which has no burden for practical purposes but is only a logical whim, has already been made by others, but I would like to propose it again here.
The scheme used by @rafael.guerra here (and by all of us elsewhere) sets forth ONE example where two different algorithms give the same result, implying that this holds as equivalence in general.
obviously this is not logically true(the only thing that a single example can logically imply is NOT GENERAL EQUIVALENCE).

julia> m = rand(0:9, 5, 10_000)
5×10000 Matrix{Int64}:
 1  8  0  6  2  7  3  7  7  8  …  7  5  1  7  1  3  4  7  6  7    
 1  9  4  8  5  9  3  3  8  3     4  3  9  5  7  0  6  8  0  2    
 4  7  8  3  6  3  8  1  7  8     8  6  0  9  9  7  7  4  5  6
 4  2  8  9  1  6  4  1  5  8     9  6  0  6  7  8  0  8  1  0    
 1  9  0  8  5  0  3  1  1  3     5  0  8  4  8  0  3  9  5  5    

julia> @btime findall(allequal(c) for c in eachcol(m))
  88.400 μs (3 allocations: 128 bytes)
Int64[]
...
julia> @btime findall(i->allequal(@views m[:,i]), axes(m,1))      
  1.410 μs (27 allocations: 912 bytes)
Int64[]

julia> @btime findall(i->allequal(@views m[:,i]), axes(m,2))      
  1.836 ms (67968 allocations: 1.34 MiB)
Int64[]

Topic		Replies	Views
Indexing vector elements of a matrix General Usage question	2	778	May 6, 2021
How to efficiently find columns of the matrix which are the same? New to Julia question , optimization	10	1160	November 7, 2023
Find indexes of a String in Multi -Type Matrix General Usage indexing , arrays , undef	2	37	April 14, 2025
How to find indices like Matlab General Usage question	3	1304	September 23, 2021
Accessing values in a matrix for a column before the first? General Usage question , array	4	1196	April 20, 2021

How do I get the indexes for matrix columns whose elements are all the same?

Related topics