Remove identical columns from matrix

amrods · June 4, 2021, 6:47am

Suppose I have a matrix with several columns. What’s an easy way to remove identical columns? To provide an example, I want to get matrix x below if I start from matrix xx.

x = rand(10, 5)
xx = [x x[:, 2] x[:, 5]]

yakir12 · June 4, 2021, 6:56am

hcat(unique(eachcol(xx))...)

amrods · June 4, 2021, 7:13am

I was expecting some kind of algorithm. There is always a better way!

amrods · June 4, 2021, 7:21am

Challenge question: what if I wanted to eliminate collinear columns, rather than identical columns?

ettersi · June 4, 2021, 7:28am

hcat(unique(normalize.(eachcol(xx)))...) should do the trick

amrods · June 4, 2021, 7:51am

normalize?

ettersi · June 4, 2021, 7:52am

It’s provided by the LinearAlgebra package.

amrods · June 4, 2021, 7:56am

That returns a transform of matrix x, which I don’t want, and still returns a 10x6 matrix rather than x.

x = rand(10, 5)
xx = [x x[:, 2] x[:, 5] x[:, 1] .+ 2 .* x[:, 3]]
hcat(unique(normalize.(eachcol(xx)))...)

ettersi · June 4, 2021, 8:04am

That’s because x[:, 1] .+ 2 .* x[:, 3] is a linear combination of the columns of x but not collinear with any column in x.

sijo · June 4, 2021, 8:16am

If you want to get a maximal linearly independent subset of the original columns (in other words, find columns of A which build a basis of the column space) you can do:

using RowEchelon

_, pivots = rref_with_pivots(A)
A[:, pivots]

Or you can use svd(A).U[:, 1:rank(A)] if you just want a basis of the column space, and don’t care if the basis vectors are columns of A. (Note that rank does its own SVD so if performance is an issue, get the singular values from the first svd call and check yourself which are close to 0, which is what rank does.)