 # Remove identical columns from matrix

Suppose I have a matrix with several columns. What’s an easy way to remove identical columns? To provide an example, I want to get matrix `x` below if I start from matrix `xx`.

``````x = rand(10, 5)
xx = [x x[:, 2] x[:, 5]]
``````
``````hcat(unique(eachcol(xx))...)
``````
2 Likes

I was expecting some kind of algorithm. There is always a better way! Challenge question: what if I wanted to eliminate collinear columns, rather than identical columns?

`hcat(unique(normalize.(eachcol(xx)))...)` should do the trick

`normalize`?

It’s provided by the `LinearAlgebra` package.

That returns a transform of matrix `x`, which I don’t want, and still returns a 10x6 matrix rather than `x`.

``````x = rand(10, 5)
xx = [x x[:, 2] x[:, 5] x[:, 1] .+ 2 .* x[:, 3]]
hcat(unique(normalize.(eachcol(xx)))...)
``````

That’s because `x[:, 1] .+ 2 .* x[:, 3]` is a linear combination of the columns of `x` but not collinear with any column in `x`.

If you want to get a maximal linearly independent subset of the original columns (in other words, find columns of A which build a basis of the column space) you can do:

``````using RowEchelon

_, pivots = rref_with_pivots(A)
A[:, pivots]
``````

Or you can use `svd(A).U[:, 1:rank(A)]` if you just want a basis of the column space, and don’t care if the basis vectors are columns of `A`. (Note that `rank` does its own SVD so if performance is an issue, get the singular values from the first `svd` call and check yourself which are close to 0, which is what `rank` does.)

3 Likes