Checking if a list of vectors are all identical

Gus_Hart · February 14, 2024, 5:33pm

(Looking for the most idiomatic way to solve this problem. It’s inside several nested loops.)

I have a situation where I have two lists of 3-vectors. Often these two lists are identical except for displacement. (One set of points is merely a shifted copy of the other.) To check for identity, I just subtract them and then check to see that the component-wise difference is the same for all vectors. So the difference between two sets of vectors might be something like this (just a MWE, not realistic)

julia> t
4-element Vector{Vector{Float64}}:
 [0.1, 0.2, 3.3]
 [0.2, -0.2, 1.0]
 [0.3, 0.25, -1.0]
 [-0.1, -0.2, 0.3]

julia> s
4-element Vector{Vector{Float64}}:
 [-0.1, 0.5, 4.6]
 [0.0, 0.1, 2.3]
 [0.1, 0.55, 0.3]
 [-0.3, 0.1, 1.6]

julia> diffs=t-s
4-element Vector{Vector{Float64}}:
 [0.2, -0.3, -1.2999999999999998]
 [0.2, -0.30000000000000004, -1.2999999999999998]
 [0.19999999999999998, -0.30000000000000004, -1.3]
 [0.19999999999999998, -0.30000000000000004, -1.3]

So then I check to see that each difference vector matches the others by:

julia> all([diffs[1]≈i for i in diffs[2:end]])

Not sure if this is the best way to do this. Is there a more idiomatic way? It seems verbose with the list comprehension…

(I could perhaps store the points as columns in a matrix instead of as a vector of vectors…)

abraemer · February 14, 2024, 5:53pm

If allocations/performance don’t matter, I propose

diffs = t-s
sum(sum.(diff(diffs))) ≈ 0

but not sure whether this achieves the maximal clarity

jling · February 14, 2024, 6:44pm

only slightly better:

julia> diffs = t-s;

julia> all(≈(first(diffs)), diffs)
true

DNF · February 14, 2024, 7:04pm

First of all, remove the brackets:

all(diffs[1]≈i for i in diffs[2:end])

to avoid (one of) the redundant allocation(s).

In general, though I would try to avoid subtracting all those vectors, don’t create diffs in the first place, instead iterate, comparing subtractions on the fly. Ideally, there should be zero allocations for this.

rafael.guerra · February 14, 2024, 8:15pm

For example:

all(tᵢ - sᵢ ≈ t[1] - s[1] for (tᵢ, sᵢ) in zip(t,s))

rocco_sprmnt21 · February 14, 2024, 8:19pm

t .- [t[1]] ≈ s .- [s[1]]

rafael.guerra · February 14, 2024, 8:28pm

@rocco_sprmnt21, are you sure it is computing the same thing?
From the docs, it looks more like it uses the norm:
gif

stevengj · February 14, 2024, 8:39pm

Using ≈ elementwise like this may be unreliable. Approximate comparison ≈ is not something you can use blindly — you are asking that the difference should be small, but you should always think “small compared to what”?

The problem with doing approximate comparison elementwise is that a ≈ b checks by default that |a - b| is small compared to |a| or |b|, and by default to about 8 digits. But what if t[1] - s[1] happens to be zero (or nearly zero)? Then tᵢ - sᵢ ≈ t[1] - s[1] will almost always be false, because nothing is small compared to zero (except for zero itself).

In your case, it’s a tricky question — what is the right “yardstick” to compare to? If you know something from your problem (e.g. some overall scale of your coordinate system), then you can use it to pass an absolute tolerance atol to isapprox (the same function as ≈). Otherwise, you would want to compute some relevant scale, e.g. the diameter of your point cloud.

(There’s also the delicate question of choosing an appropriate tolerance. atol = lengthscale * small, but what is small? √eps() ≈ 1.5e-8 is the default relative tolerance rtol for ≈, corresponding to half the significant digits, because this is a reasonable rule of thumb for unit tests: small enough to catch most bugs, but large enough to be resilient to roundoff errors. But if your application is not a unit test, your needed tolerance may be very different.)

I have a situation where I have two lists of 3-vectors

Working with this data structure will probably be much, much faster if you use arrays of SVector{3,Float64} from StaticArrays.jl. Then you can use natural vector operations, like subtracting two vectors, and it will be (a) allocation-free and (b) unrolled by the compiler. See the performance tips.

rafael.guerra · February 14, 2024, 8:50pm

Thanks Steve. Would something like this be better? (*edited using: atol)

all(isapprox(tᵢ - sᵢ, t[1] - s[1], atol=1e-3) for (tᵢ, sᵢ) in zip(t,s))

Dan · February 14, 2024, 8:56pm

How does:

all(s .+ Ref(mean(t-s)) .≈ t)

look?

stevengj · February 14, 2024, 8:58pm

No, because you are passing only a relative tolerance rtol. That means it will still probably fail if t[1] - s[1] happens to be nearly zero.

Yes, using atol is safer, but requires you to pick a reasonable atol, and that depends on the problem because atol is dimensionful.

For example, suppose your vectors represent positions of stars in the galaxy, with coordinates measured in meters. Then atol = 1e-3 (1mm) is probably a ridiculously small tolerance. On the other hand, if you are doing atomic physics with distances in meters, where the typical scale is angstroms, then atol = 1e-3 is a ridiculously large tolerance.

This is still likely to fail if any element of t happens to be zero or nearly zero.

Note, by the way, that ≈ works perfectly well on arrays, or arrays of arrays (anything with a norm), so using ≈ with all is generally unnecessary (and more susceptible to scaling problems). You can just do s .+ Ref(mean(t-s)) ≈ t (no dot on the ≈), which is more robust although it will still be problematic if norm(t) == 0 (maybe that is less likely … but ultimately you should still be thinking about an appropriate problem-dependent atol).

rocco_sprmnt21 · February 14, 2024, 9:38pm

I’m not sure I understand the question/observation. I definitely don’t understand the reference to the norm.
What I wrote (which is ultimately equivalent to the other proposed forms. I don’t know if from a numerical point of view it is more robust to Stevenj’s observations) is based on the fact that if the two vectors are equivalent under a translation, then translating them into so that the first element of both goes to the origin, the two vectors must “overlap” element by element

Dan · February 14, 2024, 9:53pm

Yes, I didn’t fix the problem.
But:

using LinearAlgebra

all(s - t .≈ Ref(mean(s - t)))

might fix it: isapprox defaults to use norm when comparing vectors, and mean(s-t)'s norm is a good measuring yardstick for how small the dispersion of s - t should be (limited through rtol).

But I’m still not sure enough. Doing a more explicit and simpler test would avoid bugs.

stevengj · February 14, 2024, 10:24pm

Still seems wrong. The mean could easily be zero, no?

stevengj · February 14, 2024, 10:30pm

Your code is better than elementwise — as I wrote above, ≈ is generally better to do on whole arrays — the more values the better to gauge the correct overall scale. But the problem with your code is still the implicit relative tolerance on the ≈, which can make the approximate comparison fail unexpectedly.

For example, your code returns false in this case:

julia> s = [1, 1, 1]; t = [1, 1, 1 + 1e-15]
3-element Vector{Float64}:
 1.0
 1.0
 1.000000000000001

julia> t .- [t[1]] ≈ s .- [s[1]]
false

even though s ≈ t to 15 digits. (It is giving a correct answer to the question you asked! If you compare [0,0,1e-15] to [0,0,0], they don’t match in the third component to any significant digits. You have given isapprox no other yardstick by which to say that 1e-15 is small in this problem.)

To get a more reliable comparison, you really need to think about a problem-specific atol.

rafael.guerra · February 14, 2024, 11:11pm

Assuming that 1e-3 is a good yardstick for the OP’s data, then your preference would be to modify @rocco_sprmnt21’s snippet as follows?

isapprox(t .- [t[1]], s .- [s[1]], atol=1e-3)

Dan · February 15, 2024, 1:53am

Oh this looks pretty sweet and works even in some edge cases:

julia> diff(s) ≈ diff(t)
true

mbauman · February 15, 2024, 2:09am

If they’re all three vectors, then you might want to use a static length structure for them all… and then that lends itself nicely to being reinterpreted as a matrix… and then there’s Linear Algebra fun to be had with linear dependencies.

rocco_sprmnt21 · February 15, 2024, 7:33am

I’m afraid not. The counterexample proposed by Stevengj also applies to this nice solution.
This leads me to advance the following conjecture: for every proposed solution, in the “numerical context” there is always a counterexample that makes it fail.

rocco_sprmnt21 · February 15, 2024, 7:37am

Now I think I understand your previous observation. I was looking for an answer to the OP’s request for an “idiomatic solution”, but I didn’t take the numerical aspects into consideration at all (not voluntarily: just out of ignorance. I don’t have any experience with software for “real” uses).

Topic		Replies	Views
Unit test equality for arrays? General Usage	14	7047	April 22, 2021
Test: absolute tolerance for vectors General Usage testing	1	616	February 14, 2019
Usage of arrays of static arrays New to Julia performance , staticarrays	16	1097	February 22, 2023
Best way to handle a list of arrays (Asking for recommendations) General Usage	3	454	December 21, 2018
Vector notation, intermediate value and allocations New to Julia	8	456	June 16, 2022

Checking if a list of vectors are all identical

Related topics