Checking if a list of vectors are all identical

(Looking for the most idiomatic way to solve this problem. It’s inside several nested loops.)

I have a situation where I have two lists of 3-vectors. Often these two lists are identical except for displacement. (One set of points is merely a shifted copy of the other.) To check for identity, I just subtract them and then check to see that the component-wise difference is the same for all vectors. So the difference between two sets of vectors might be something like this (just a MWE, not realistic)

julia> t
4-element Vector{Vector{Float64}}:
 [0.1, 0.2, 3.3]
 [0.2, -0.2, 1.0]
 [0.3, 0.25, -1.0]
 [-0.1, -0.2, 0.3]

julia> s
4-element Vector{Vector{Float64}}:
 [-0.1, 0.5, 4.6]
 [0.0, 0.1, 2.3]
 [0.1, 0.55, 0.3]
 [-0.3, 0.1, 1.6]

julia> diffs=t-s
4-element Vector{Vector{Float64}}:
 [0.2, -0.3, -1.2999999999999998]
 [0.2, -0.30000000000000004, -1.2999999999999998]
 [0.19999999999999998, -0.30000000000000004, -1.3]
 [0.19999999999999998, -0.30000000000000004, -1.3]

So then I check to see that each difference vector matches the others by:

julia> all([diffs[1]≈i for i in diffs[2:end]])

Not sure if this is the best way to do this. Is there a more idiomatic way? It seems verbose with the list comprehension…

(I could perhaps store the points as columns in a matrix instead of as a vector of vectors…)

If allocations/performance don’t matter, I propose

diffs = t-s
sum(sum.(diff(diffs))) ≈ 0

but not sure whether this achieves the maximal clarity :laughing:

only slightly better:

julia> diffs = t-s;

julia> all(≈(first(diffs)), diffs)
true
3 Likes

First of all, remove the brackets:

all(diffs[1]≈i for i in diffs[2:end])

to avoid (one of) the redundant allocation(s).

In general, though I would try to avoid subtracting all those vectors, don’t create diffs in the first place, instead iterate, comparing subtractions on the fly. Ideally, there should be zero allocations for this.

3 Likes

For example:

all(tᵢ - sᵢ ≈ t[1] - s[1] for (tᵢ, sᵢ) in zip(t,s))
t .- [t[1]] ≈ s .- [s[1]] 
1 Like

@rocco_sprmnt21, are you sure it is computing the same thing?
From the docs, it looks more like it uses the norm:
gif

1 Like

Using elementwise like this may be unreliable. Approximate comparison is not something you can use blindly — you are asking that the difference should be small, but you should always think “small compared to what”?

The problem with doing approximate comparison elementwise is that a ≈ b checks by default that |a - b| is small compared to |a| or |b|, and by default to about 8 digits. But what if t[1] - s[1] happens to be zero (or nearly zero)? Then tᵢ - sᵢ ≈ t[1] - s[1] will almost always be false, because nothing is small compared to zero (except for zero itself).

In your case, it’s a tricky question — what is the right “yardstick” to compare to? If you know something from your problem (e.g. some overall scale of your coordinate system), then you can use it to pass an absolute tolerance atol to isapprox (the same function as ). Otherwise, you would want to compute some relevant scale, e.g. the diameter of your point cloud.

(There’s also the delicate question of choosing an appropriate tolerance. atol = lengthscale * small, but what is small? √eps() ≈ 1.5e-8 is the default relative tolerance rtol for , corresponding to half the significant digits, because this is a reasonable rule of thumb for unit tests: small enough to catch most bugs, but large enough to be resilient to roundoff errors. But if your application is not a unit test, your needed tolerance may be very different.)

I have a situation where I have two lists of 3-vectors

Working with this data structure will probably be much, much faster if you use arrays of SVector{3,Float64} from StaticArrays.jl. Then you can use natural vector operations, like subtracting two vectors, and it will be (a) allocation-free and (b) unrolled by the compiler. See the performance tips.

5 Likes

Thanks Steve. Would something like this be better? (*edited using: atol)

all(isapprox(tᵢ - sᵢ, t[1] - s[1], atol=1e-3) for (tᵢ, sᵢ) in zip(t,s))

How does:

all(s .+ Ref(mean(t-s)) .≈ t)

look?

No, because you are passing only a relative tolerance rtol. That means it will still probably fail if t[1] - s[1] happens to be nearly zero.

Yes, using atol is safer, but requires you to pick a reasonable atol, and that depends on the problem because atol is dimensionful.

For example, suppose your vectors represent positions of stars in the galaxy, with coordinates measured in meters. Then atol = 1e-3 (1mm) is probably a ridiculously small tolerance. On the other hand, if you are doing atomic physics with distances in meters, where the typical scale is angstroms, then atol = 1e-3 is a ridiculously large tolerance.

This is still likely to fail if any element of t happens to be zero or nearly zero.

Note, by the way, that works perfectly well on arrays, or arrays of arrays (anything with a norm), so using with all is generally unnecessary (and more susceptible to scaling problems). You can just do s .+ Ref(mean(t-s)) ≈ t (no dot on the ), which is more robust although it will still be problematic if norm(t) == 0 (maybe that is less likely … but ultimately you should still be thinking about an appropriate problem-dependent atol).

1 Like

I’m not sure I understand the question/observation. I definitely don’t understand the reference to the norm.
What I wrote (which is ultimately equivalent to the other proposed forms. I don’t know if from a numerical point of view it is more robust to Stevenj’s observations) is based on the fact that if the two vectors are equivalent under a translation, then translating them into so that the first element of both goes to the origin, the two vectors must “overlap” element by element

:person_facepalming: Yes, I didn’t fix the problem.
But:

using LinearAlgebra

all(s - t .≈ Ref(mean(s - t)))

might fix it: isapprox defaults to use norm when comparing vectors, and mean(s-t)'s norm is a good measuring yardstick for how small the dispersion of s - t should be (limited through rtol).

But I’m still not sure enough. Doing a more explicit and simpler test would avoid bugs.

Still seems wrong. The mean could easily be zero, no?

Your code is better than elementwise — as I wrote above, is generally better to do on whole arrays — the more values the better to gauge the correct overall scale. But the problem with your code is still the implicit relative tolerance on the , which can make the approximate comparison fail unexpectedly.

For example, your code returns false in this case:

julia> s = [1, 1, 1]; t = [1, 1, 1 + 1e-15]
3-element Vector{Float64}:
 1.0
 1.0
 1.000000000000001

julia> t .- [t[1]] ≈ s .- [s[1]]
false

even though s ≈ t to 15 digits. (It is giving a correct answer to the question you asked! If you compare [0,0,1e-15] to [0,0,0], they don’t match in the third component to any significant digits. You have given isapprox no other yardstick by which to say that 1e-15 is small in this problem.)

To get a more reliable comparison, you really need to think about a problem-specific atol.

1 Like

Assuming that 1e-3 is a good yardstick for the OP’s data, then your preference would be to modify @rocco_sprmnt21’s snippet as follows?

isapprox(t .- [t[1]], s .- [s[1]], atol=1e-3)

Oh this looks pretty sweet and works even in some edge cases:

julia> diff(s) ≈ diff(t)
true
1 Like

If they’re all three vectors, then you might want to use a static length structure for them all… and then that lends itself nicely to being reinterpreted as a matrix… and then there’s Linear Algebra fun to be had with linear dependencies. :slight_smile:

I’m afraid not. The counterexample proposed by Stevengj also applies to this nice solution.
This leads me to advance the following conjecture: for every proposed solution, in the “numerical context” there is always a counterexample that makes it fail.

1 Like

Now I think I understand your previous observation. I was looking for an answer to the OP’s request for an “idiomatic solution”, but I didn’t take the numerical aspects into consideration at all (not voluntarily: just out of ignorance. I don’t have any experience with software for “real” uses).