A method for detecting and removing coolinear inner points

Does anyone here know of a “simple” method for detecting collinear points (in this case, 2D coordinates) from n > 2 points, and deleting the ones that are most “inner”?

So from (0, 0) (0, 2) (0, 1) (1, 1) the first three are collinear and I want to delete the third point because the first two are at the edges of the line they build.

1 Like

Or maybe this

using EuclideanDistanceMatrices
data = [1:10 [zeros(9); 1]]
dim  = 1 # Assume data is 1D even though it's 2D
D    = [sum(abs2, x-y) for x in eachrow(data), y in eachrow(data)] # Distance matrix
D2   = denoise_distmat(D, dim, 1)
outlier_index = argmax(sum(D-D2, dims=2))
# CartesianIndex(10, 1)

EDIT: disregard this suggestion, it’s not actually working that well. I’ll try to understand why not :slight_smile:

EDIT2: I was missing a call to abs:

data = [1:10 [zeros(4); 1; zeros(5)]]
dim  = 1 # Assume data is 1D even though it's 2D
D    = [sum(abs2, x-y) for x in eachrow(data), y in eachrow(data)] # Distance matrix
D2   = denoise_distmat(D, dim, 1)
outlier_index = argmax(abs.(diag(D-D2)))

This method constructs an Euclidean (squared) distance matrix and performs a robust matrix factorization of it. Theory says that if the data is one dimensional, the rank of the distance matrix will be 1+2 = 3. denoise_distmat forms the robust low-rank approximation of rank 3, and by looking at the entry of the matrix that was perturbed the most by the lowrank approximation we have found the outlier. This is only expected to work well for a small number of outliers, if the number of colinear points is small in comparison to the number of other points, RANSAC is probably a better strategy.

1 Like

I should add that the collinearity I’m interested in is absolute, so for the first 3 points from above:

rank([0 0 0; 0 2 1]) == 1

Thanks! I quickly discovered that I needed to

] add https://github.com/baggepinnen/Turing2MonteCarloMeasurements.jl

as a dependency to EuclideanDistanceMatrices, but then I ran into:

(jl_YHkeAE) pkg> add https://github.com/baggepinnen/EuclideanDistanceMatrices.jl

   Updating git-repo `https://github.com/baggepinnen/EuclideanDistanceMatrices.jl`
  Resolving package versions...
  Installed CodecBzip2 ───────── v0.7.2
  Installed ZipFile ──────────── v0.9.3
  Installed JSONSchema ───────── v0.3.2
  Installed SCS ──────────────── v0.6.6
  Installed TotalLeastSquares ── v1.6.0
  Installed MutableArithmetics ─ v0.2.10
  Installed MathOptInterface ─── v0.9.17
  Installed MathProgBase ─────── v0.7.8
  Installed Convex ───────────── v0.13.7
Updating `/tmp/jl_YHkeAE/Project.toml`
  [c8f64d0f] + EuclideanDistanceMatrices v0.1.0 `https://github.com/baggepinnen/EuclideanDistanceMatrices.jl#master`
Updating `/tmp/jl_YHkeAE/Manifest.toml`
  [523fee87] + CodecBzip2 v0.7.2
  [944b1d66] + CodecZlib v0.7.0
  [f65535da] + Convex v0.13.7
  [b4f34e82] + Distances v0.9.2
  [c8f64d0f] + EuclideanDistanceMatrices v0.1.0 `https://github.com/baggepinnen/EuclideanDistanceMatrices.jl#master`
  [7d188eb4] + JSONSchema v0.3.2
  [b8f27783] + MathOptInterface v0.9.17
  [fdba3010] + MathProgBase v0.7.8
  [d8a4904e] + MutableArithmetics v0.2.10
  [c946c3f1] + SCS v0.6.6
  [028f657a] + TotalLeastSquares v1.6.0
  [3bb67fe8] + TranscodingStreams v0.9.5
  [a5390f91] + ZipFile v0.9.3
   Building SCS → `~/.julia/packages/SCS/28Svy/deps/build.log`
┌ Error: Error building `SCS`: 
│ [ Info: Downloading https://github.com/JuliaOpt/SCSBuilder/releases/download/v2.1.1/SCSBuilder.v2.1.1.x86_64-linux-gnu-gcc8.tar.gz to /home/yakir/.julia/packages/SCS/28Svy/deps/usr/downloads/SCSBuilder.v2.1.1.x86_64-linux-gnu-gcc8.tar.gz...
│ ERROR: LoadError: LibraryProduct(nothing, ["libscsindir"], :indirect, "Prefix(/home/yakir/.julia/packages/SCS/28Svy/deps/usr)") is not satisfied, cannot generate deps.jl!
│ Stacktrace:
│  [1] write_deps_file(::String, ::Array{LibraryProduct,1}; verbose::Bool, isolate::Bool) at /home/yakir/.julia/packages/BinaryProvider/U2dKK/src/Products.jl:419
│  [2] top-level scope at /home/yakir/.julia/packages/SCS/28Svy/deps/build.jl:94
│  [3] include(::String) at ./client.jl:457
│  [4] top-level scope at none:5
│ in expression starting at /home/yakir/.julia/packages/SCS/28Svy/deps/build.jl:94
│ [17:36:27] #=#=#                                                              [17:36:28] ##O#-#                                                               [17:36:28] ######################################################################## 100,0%##O=#  #                                                              [17:36:28] #-#O=#  #                                                            [17:36:28]  #=#=-#   #                                                          [17:36:29] ####                                                                 [17:36:29] ####                                                                 [17:36:29] #############                                                        [17:36:29] ##############################                                       [17:36:29] ##################################################################   [17:36:29] ######################################################################## 100,0%
└ @ Pkg.Operations /build/julia/src/julia-1.5.2/usr/share/julia/stdlib/v1.5/Pkg/src/Operations.jl:949

Ah, sorry about that, the package is not registered and seem to have become hard to install. Try the following equivalent thing

using TotalLeastSquares
data = [1:10 [zeros(4); 1; zeros(5)]]
dim  = 1 # Assume data is 1D even though it's 2D
D    = [sum(abs2, x-y) for x in eachrow(data), y in eachrow(data)] # Distance matrix
A,E,s,sv = rpca(D, nonnegA=true)
function lowrankapprox(s::SVD, r)
    @views s.U[:,1:r] * Diagonal(s.S[1:r]) * s.Vt[1:r,:]
end
D2   = lowrankapprox(svd(A), 3)
outlier_index = argmax(abs.(diag(D-D2)))

The function rpca has a tuning parameter which can be tuned so that the estimated rank sv becomes 3. If I call

A,E,s,sv = rpca(D, nonnegA=true, λ=1.5 / √(maximum(size(D)))) # The default is 1 rather than 1.5
plot(diag(D-D2))

I get a clean identification of the outlier.

1 Like

Thank you @baggepinnen, but I must have explained my self poorly. I wanted to remove all the points that were between the edge-points of a collinear collection.

I ended up with the following method:

First a function that detects if 3 points lie on a line, and if so, return the point in the middle:

function find_collinear(xy)
    inds = ((1, 2), (1, 3), (2, 3))
    Δ = [norm(xy[i1] - xy[i2]) for (i1, i2) in inds]
    M, i = findmax(Δ)
    j, l = setdiff(1:3, i)
    Δ[j] + Δ[l] ≈ M && return only(setdiff(1:3, inds[i]))
    nothing
end

Then, if I use your MWE data, I can iterate over all combinations of 3 points, test to see if they are collinear and return the index to middle point (that I want to later remove):

data = [1:10 [zeros(4); 1; zeros(5)]]
xy = SVector{2, Float64}.(eachrow(data))
k = Int[]
for j in combinations(1:length(xy), 3)
    if !any(∈(k), j)
        i = find_collinear(xy[j])
        updatek!(k, j, i)
    end
end

where updatek! is just a helper function to push! the culprit into k:

updatek!(k, j, ::Nothing) = nothing
updatek!(k, j, i) = push!(k, j[i])

and a final plot:

scatter(xy)
scatter!(xy[k], color = :red)

As you can see, all the points in red lie on a line between two edge points, I wanna keep those edge points as well as all the non-collinear points in the collection:

julia> deleteat!(xy, k)
3-element Array{SArray{Tuple{2},Float64,1,2},1}:
 [1.0, 0.0]
 [5.0, 1.0]
 [10.0, 0.0]