History of `isapprox` in languages

Does anyone know why the de-facto standard definition of isapprox intentionally fails when one of the arguments is zero?

isapprox(0.0, 0.0 + 1e-10) # false
isapprox(1.0, 1.0 + 1e-10) # true

It is clear from the formula why this is happening:

||x - y|| \le max(atol, rtol*max(||x||, ||y||))

If we set y=0, we get:

||x|| \le max(atol, rtol*||x||)

The default atol = 0 then makes it:

||x|| \le rtol * ||x||

and the default rtol = \sqrt\epsilon leads to:

||x|| \le \sqrt\epsilon ||x||

which is false because \sqrt\epsilon < 1.

This formula is also used by Python’s math standard library isclose as explained in their docs. More recently, the NumPy library changed the definition to an asymmetric one:

||x - y|| <= atol + rtol * ||y||

which is “more intuitive” when one of the arguments is zero:

>>> np.isclose(0.0 + 1e-10, 0.0)
np.True_
>>> np.isclose(1.0 + 1e-10, 1.0)
np.True_

However, the asymmetry introduces new nuances as explained in their docs.

Appreciate if someone with background in numerics could justify these formulas with practical examples. Can we achieve something better that works “intuitively” in all cases, including zero arguments?

6 Likes

I guess, I’m not the profile you are expecting, but there is one simple reason which is that the order of magnitude of a quantity depends on its units, thus:

julia> using Unitful

julia> 1u"nm" ≈ 0.0u"nm"
false

julia> 1e-9u"m" ≈ 0.0u"m"
false

there is no reason for one to be true, the other not. Comparing to something that is zero is always context dependent.

14 Likes
7 Likes

Thank you @lmiq , wouldn’t it make sense to rely on the atol definition when one of the arguments is 0? The issue seems to be related to the fact that it is not possible to find a default atol for 0. We found a Python PEP discussing this:

From what I am getting, the recommendation in general is to always use atol explicitly if the values passed to isapprox can be zero.

2 Likes

Well, if I’m making a calculation where I couldn’t avoid a scaling such that the coordinates are in meters but the values are actually of the order of micrometers, I wouldn´t like isapprox returning to something that would literally mean (without units):

julia> 1e-6 ≈ 0.0
false

julia> 1u"μm" ≈ 0.0u"μm"
false

julia> 1e-6u"m" ≈ 0.0u"μm"
false

anything not returning false in all these cases would become very complicated to handle.

2 Likes

But we do — you can specify the atol yourself:

julia> isapprox(0.0, 1e-10, atol=1e-10)
true

Somewhat tellingly, Numpy explicitly warns against using isclose if you might be close to zero because its behavior is likely to be inappropriate in such cases.

image

11 Likes

From the examples with Unitful.jl it seems that we could workaround a default atol that is unitless and fixed, which is derived from the units of the quantity?

I am just brainstorming to see if there is something better to do when we have a rich type system with units as types. Maybe Unitful.jl could do something “more intuitive” to compare near zero?

I think part of the problem is that trying to find defaults to cover all possible ranges and that could care simultaneously about units is not a simple problem. The current defaults are nice, but do require at least some thought if you want to compare near-zero numbers which is probably a good thing to be cautious of anyway.

I also don’t think the behaviour of having the defaults change in the background because one of the arguments is zero would be that nice to work with.

3 Likes

I don’t know. My impression is that multiple people struggled with this behavior in the past, leading to the above Python PEP, alternative definitions, etc.

It would be really nice if Unitful.jl or similar package could handle this more “intuitively” so that we could always rely on “good” defaults in the presence or absence of 0. I am not sure if that would be better in general, but I can imagine that one could fix a “fallback” atol for unitless numbers, and then use dispatch to reduce to this base case from unitful quantities. This “fallback” atol could then be set at runtime with things like ScopedValues.jl as you suggested elsewhere.

This certainly isn’t conclusive, but here is how I personally made peace with approximate comparisons with zero not really making sense without specifying some absolute tolerance: If one thinks that some small number, say 1e-20 is small enough to be approximately zero, then this is essentially the same as thinking that 1e20 should be approximately equal to infinity. Personally, I would be quite surprised by 1e20 ≈ Inf evaluating to true.

2 Likes

The main difference in the case of 0 is that we are considering a mental model where we are bounded below. If we take x_n \to 0 from the right as in x_n = 1/n, we are certain that we will never cross zero, and will converge to it.

The example with Inf doesn’t fall in this mental model.

Agree it is a very delicate issue though.

How? You’re asking if something is approximately zero — and to do it “intuitively,” in the way that you mean. But how do I know what you mean, even if you’re explicitly using units? A high energy physicist might care about distances down to 10^-35m, but an architect would be okay with a millimeter or two and a GPS app happy to be within a meter.

Float64 can meaningfully represent over 600 orders of magnitude — and every single one of those orders of magnitude has ~16 digits of precision. You can always ask if ~half the digits match — that’s the default rtol. But how could I ever guess how many of those orders of magnitude should be considered “insignificant” to you? That’s the atol.

You can always define your own comparator — unicode combining characters or sub-/super-scripts are great for this:

julia> ≈ₐ(x,y) = isapprox(x, y, atol=1e-10)
≈ₐ (generic function with 1 method)

julia> 1e-10 ≈ₐ 0.0
true
16 Likes

Yes, that is what we have been doing internally. I was wondering if this approach with ScopedValues.jl could be adopted by Julia Base where a “default” atol near 0 is fixed, and users can adjust their own “atol” when necessary as in:

using ScopedValues

with(Base.ATOL64 => 1e-9, Base.ATOL32 => 1f-4) do
  # do your computations with custom tolerances
  isapprox(x, y)
end

Would that improve the status quo or be more confusing?

No, this still doesn’t make sense. It’s perfectly valid to do atomic physics with distances in meters, but then 10^{-9} is a large number for a distance!

The absolute scale depends on your problem, not just on the units you chose. There is no reasonable way to pick it without having more information.

I think many people’s mental model for floating-point arithmetic and comparisons is just wrong. They think 10^{-16} is a small number in an absolute sense, as if it were fixed-point arithmetic. And I think numpy’s isclose function made the wrong choice in setting a default absolute tolerance, and a further wrong choice in comparing vectors elementwise (rather than using the length of the vector to set an overall scale). Not something we want to emulate.

See also previous discussions like Approximate equality

The whole point is that there is no reasonable default atol other than zero.

24 Likes

Speaking personally, I can still remember being very confused about the term machine epsilon because — to past-Matt — machine felt very absolute and global, almost like a “speed of light” sort of limit to the accuracy on a given computer. It’s such a bad name for an inherently relative value that has nothing to do with computer hardware at all. Even worse, I think it (and probably other things) led me to the misconception that floating point numbers are themselves fuzzy and approximate.

I have concepts of a plan for a blogpost someday entitled “floating point numbers are perfectly exact, they just might not be exactly the number you think.”

15 Likes

What they don’t tell you is that their default atol is also inappropriate if you are working with large magnitudes. 10^{-8} is an absurdly small tolerance if you are doing astrophysics in meters. So instead of false positives from isclose you will get false negatives in that case.

It’s basically only appropriate if you are working with quantities scaled to have typical magnitudes around 1.

6 Likes

Yes, but I’d like to point out that in most problems one of the first steps we should be doing is creating a dimensionless version of the problem, and that usually means scaling the “typical” value of measurements to be around 1.

Since not everyone may understand what that means, here’s a simple example:

I start with a ball on the end of a spring. The ball has mass m in mass units, the spring has spring constant k in some force units per distance unit, and there is an initial displacement in some distance units. We measure distance as positive upwards from the equilibrium position when we let go of the ball and it stays still.

We have the following differential equation describing the oscillating motion:

m d^2y/dt^2 = -m*g - k*y

There are three dimensions involved in this problem, length, mass, and time, and three constants, m,g,k, Buckingham’s Pi theorem says we can eliminate all the constants by choice of dimensionless scales. Let’s start by dividing the whole thing by m*g to get a dimensionless equation:

1/g d^2y/dt^2 = -1 -(k/(m*g))*y

Now, let’s measure y as a fraction of the distance (m*g/k), so substitute:

(m*g/k) * y' = y
1/g * (m*g/k) d^2 y'/dt^2 = -1 -(m*g/k) * y' * k/(m*g) = -1 - y'
m/k * d^2y'/dt^2 = -1 - y'

Now since everything was dimensionless after dividing by m*g and the left hand side has dt^2 in the denominator, the quantity m/k must have dimensions of time^2 which it does. Let’s measure time as:

sqrt(m/k) * t' = t

we now have:

d^2 y' / dt'^2 = -1 - y'

Literally every mass and spring oscillator can be described by this one equation in which the units are irrelevant. In this conception, an oscillation occurs in about a “unit” of t’, the distance it displaces is about a unit of y’.

6 Likes

I don’t know what form it would take, but being able to specify something like this for a block of code, or for all operations in a module, would reduce the pain, even without touching the defaults.

A common pattern is likely writing a bunch of isapprox or using the unicode operator, then later realizing something isn’t working right because there’s a zero (and zeros are common), so then you need to rewrite it all to deal with the zero.

1 Like

If you use x ≈ 0 then you are using it wrong. If you use x ≈ y where these are arrays, some of whose components may zero, then it will do the right thing (because the overall length of a vector sets a scale).

You can already do this:

let ≈(x,y; kws...) = isapprox(x,y; atol=1e-3, kws...)
    # code
end

or redefine ≈(x,y) = isapprox(x,y, atol=1e-3) a module, which sets it for the whole module. (Though this seems weird to me — how would you know the absolute tolerance for a whole module? Also, probably clearer to define your own operator as @mbauman suggested above.)

I’m all for non-dimensionalizing and using natural units for a problem. But assuming that all users of floating-point arithmetic have done this, all the time, is not really an option. (And sometimes a problem has more than one scale.)

4 Likes

You elided the important part of my post to nitpick about the rationale. I know it is “using it wrong” as it is defined. It’s still a problem.

1 Like