Suppose I have a matrix with several columns. What’s an easy way to remove identical columns? To provide an example, I want to get matrix `x`

below if I start from matrix `xx`

.

```
x = rand(10, 5)
xx = [x x[:, 2] x[:, 5]]
```

Suppose I have a matrix with several columns. What’s an easy way to remove identical columns? To provide an example, I want to get matrix `x`

below if I start from matrix `xx`

.

```
x = rand(10, 5)
xx = [x x[:, 2] x[:, 5]]
```

```
hcat(unique(eachcol(xx))...)
```

2 Likes

I was expecting some kind of algorithm. There is always a better way!

Challenge question: what if I wanted to eliminate collinear columns, rather than identical columns?

`hcat(unique(normalize.(eachcol(xx)))...)`

should do the trick

`normalize`

?

It’s provided by the `LinearAlgebra`

package.

That returns a transform of matrix `x`

, which I don’t want, and still returns a 10x6 matrix rather than `x`

.

```
x = rand(10, 5)
xx = [x x[:, 2] x[:, 5] x[:, 1] .+ 2 .* x[:, 3]]
hcat(unique(normalize.(eachcol(xx)))...)
```

That’s because `x[:, 1] .+ 2 .* x[:, 3]`

is a linear combination of the columns of `x`

but not collinear with any column in `x`

.

If you want to get a maximal linearly independent subset of the original columns (in other words, find columns of A which build a basis of the column space) you can do:

```
using RowEchelon
_, pivots = rref_with_pivots(A)
A[:, pivots]
```

Or you can use `svd(A).U[:, 1:rank(A)]`

if you just want a basis of the column space, and don’t care if the basis vectors are columns of `A`

. (Note that `rank`

does its own SVD so if performance is an issue, get the singular values from the first `svd`

call and check yourself which are close to 0, which is what `rank`

does.)

3 Likes