Remove identical columns from matrix

Suppose I have a matrix with several columns. What’s an easy way to remove identical columns? To provide an example, I want to get matrix x below if I start from matrix xx.

x = rand(10, 5)
xx = [x x[:, 2] x[:, 5]]
hcat(unique(eachcol(xx))...)
3 Likes

I was expecting some kind of algorithm. There is always a better way! :slight_smile:

Challenge question: what if I wanted to eliminate collinear columns, rather than identical columns?

hcat(unique(normalize.(eachcol(xx)))...) should do the trick

normalize?

It’s provided by the LinearAlgebra package.

That returns a transform of matrix x, which I don’t want, and still returns a 10x6 matrix rather than x.

x = rand(10, 5)
xx = [x x[:, 2] x[:, 5] x[:, 1] .+ 2 .* x[:, 3]]
hcat(unique(normalize.(eachcol(xx)))...)

That’s because x[:, 1] .+ 2 .* x[:, 3] is a linear combination of the columns of x but not collinear with any column in x.

If you want to get a maximal linearly independent subset of the original columns (in other words, find columns of A which build a basis of the column space) you can do:

using RowEchelon

_, pivots = rref_with_pivots(A)
A[:, pivots]

Or you can use svd(A).U[:, 1:rank(A)] if you just want a basis of the column space, and don’t care if the basis vectors are columns of A. (Note that rank does its own SVD so if performance is an issue, get the singular values from the first svd call and check yourself which are close to 0, which is what rank does.)

4 Likes