History of `isapprox` in languages

I agree that it’s a problem that people don’t understand floating-point comparisons. But the fact remains that (a) there is no reasonable nonzero default atol, (b) it’s easy to set an atol if you want to (including for a block of code as noted above, either by defining your own ≈ᵃ operator or by locally redefining ).

The problem here is that x ≈ 0 is not just using wrong “as it is defined” — it fundamentally makes no sense in the context of floating-point arithmetic. It assumes that the language can magically infer an absolute scale for x in a globally defined operator .

5 Likes

But it also means that approximate equality does not obey simple equational rewrites, i.e.,

julia> let x = 1e-8; x ≈ 0 end
false

julia> let x = 1e-8; x + one(x) ≈ 1 end
true

Since approximate equality is also not an equivalence relation, it is notoriously difficult to work with and often surprises me.

Well approximate equality surely isn’t equality. Also the same statement is true for floating points equations already (just consider 1e-8 + 1e9 == 1e9). So not really an issue with per se.

2 Likes

Throwing an error if either of the arguments is exactly zero would be an option (but that is of course a breaking API change).

Numerical computing is full of formulas that intend to deal with this problem, another popular one is something like

\| x - y \| \le (a + \| x \| + \| y \|) \cdot \varepsilon

with a=1 as a default.

Unfortunately, as other have explained multiple times above, there is no generic fix because each of them has corner cases for which it fails. In other words, you cannot avoid thinking about your units, Julia won’t magically figure it out because it is a programming language, not an oracle.

4 Likes

That’s back to assuming that all quantities are scaled to around unit magnitude. It’s the usual confusion of floating point with fixed point — it’s not a “corner case” that fails, it’s just completely wrong.

3 Likes

You are trying to grapple with the question of how to compare floating-point numbers, hoping that there a cure for “isapprox”. Lots of responses.
Here’s mine.
Don’t use it.
That is, Google will find admonitions: Don’t compare floating point numbers for equality. “near equality” doesn’t make it easier.

Consider if what you intend by a particular use of isapprox(a,b) is
-eps +a < b < eps+a or
equivalently abs(a-b)<eps
where eps can be scaled,say by max(abs(a), abs(b),machine-epsilon) … or some other factor, depending on what you actually need. Defining to do it
“for all cases” by isapprox() is implausible.

There are situations in which particular aspects of floating-point
representation are examined by bit-twiddling subroutine libraries
involving machine epsilon, Not-A-Number, extended precision, etc.
Testing for exact or nearly corresponding bit patterns might have
a use in such software, but typically done by experts in floating-point
error analysis. There aren’t too many around in the community of language designers.
Richard Fateman, UC Berkeley

11 Likes

The justifications that I find for the design of functions like isapprox go something like this: Choose a tolerance somewhere near 1/2 (could be 2/3) the number of significant digits. This allows ignoring accumulation of errors due to imprecise arithmetic operations and at the same time remaining sensitive to small systematic errors introduced by, for example, a bug.

I can imagine some might say that an algorithm that is this sensitive to deviations of floating point arithmetic from its mathematical ideal is poorly designed. In any case, Julia’s isapprox seems to follow this justification and chooses exactly half the digits of precision.

It turns out that this design can be completely at odds with your intuition. The closest quasar is about 600 million light years away. This is 5.7 ⋅ 10^34 Angstroms. Does this distance change significantly if I add one more Angstrom, the radius of a Hydrogen atom?

julia> quasar_dist = BigFloat(57) * big(10)^34
5.7e+35

julia> quasar_plus_atom_dist = quasar_dist + big(1)
5.70000000000000000000000000000000001e+35

julia> isapprox(quasar_dist, quasar_plus_atom_dist)
false

The answer is “yes”.

A comment on choosing a non-zero atol by default. It seems like the most common choice assumes that the numbers you are comparing are not too different from one. Normalized vectors, for example satisfy this assumption. (But not each component). Then how many significant digits should you choose? I don’t see any reason to deviate from the choice for relative tolerance. But in this case, there is essentially no difference between the relative and absolute tolerance, so you’ve gained nothing. If the numbers you are comparing are very different from one, then the default absolute tolerance is meaningless for your problem and only confuses your test. According to this analysis, any non-zero default absolute tolerance is strictly worse than zero

1 Like

I don’t agree with a blanket statement like this. The most common application of approximate equality, by far, seems to be in software testing, where it is crucially important and appropriate. For example, if I’ve implemented a function to compute a Bessel function, I might want a regression test like:

@test mybesselj(3, some_x) ≈ known_result_for_x rtol=1e-13

to ensure that the code stays accurate to at least 13 significant digits in future revisions. Or if you write an eigensolver you might want to check the backwards accuracy A ≈ X * Λ / Χ to some tolerance for some A (note that the use of vector norms in isapprox is critical to do array comparisons like this reliably).

That’s pretty much what isapprox does, except that your pseudo-code max(abs(a), abs(b),machine-epsilon) seems to be falling into the classic mistake of treating the machine epsilon as an absolute tolerance rather than a relative tolerance.

(You seem to be advising people to roll their own approximate comparisons — which experience shows many people do badly — rather than just call isapprox and pass context-appropriate tolerances as needed? This advice mystifies me.)

But I absolutely agree that it’s best to think carefully about what tolerances you want to use an a given application, and to pass them explicitly to isapprox. The default rtol in isapprox is reasonable for a lot of simple @test a ≈ b checks where you just want to catch crude bugs (and aren’t comparing to zero): checking whether half the significant digits match will catch severe bugs but ignores most roundoff issues, but it is definitely better to put some thought into it.

I don’t understand why you think the result is non-intuitive here. Why would you be doing 77-digit arithmetic if you don’t want the computer to consider the 35th digit to be significant by default (without a manually specified tolerance)?

But yes, isapprox cannot be a magical DWIM function that always knows the tolerance you intend.

10 Likes

If you are using isapprox to test for accuracy of software, that
might be fairly misleading. Why? Your program might be tolerable at random places, and therefore pass. But it might be really bad in edge cases. For instance, near zeroes of Bessel functions, where the calculations may
be more difficult and possibly more important.

If you are testing for nearness of floats as a termination condition for an iterative process, you should have some notion of what kind of convergence is desired. If isapprox(), whatever it does, floats your boat, so to speak,
it seems to me that nevertheless a subsequent reader of the code may find it more illuminating to see something explicit like abs(a-b)< some_epsilon.

The IEEE 754 spec also has denormalized numbers / gradual underflow,
which affects how small an abs(a-b) can be computed. Presumably smaller than machine-epsilon. In the past, higher-level languages tend to lack support for such notions, making life more difficult for error analysts. Perhaps newer languages are better.

1 Like

When I said “you” I wasn’t thinking of the most experienced users. I was thinking of a user whose first thought is that has the everyday meaning “about the same”. Based on experience, I strongly suspect that there are a lot of these users. But that’s really a question for pollsters, I suppose.

2 Likes

When you test special function implementations, you don’t just pick random places to compare. You have to specially test corner case like zeros, or places where your internal approximation changes. (Indeed, for any function you want to ensure your tests are designed to cover all branches and corner/extreme cases.) But the test is still going to look like isapprox with carefully chosen tolerances. And for most test cases what you want is a relative tolerance.

And, as I said, if you roll your own approximate comparison from scratch in every case (as you seem to be advocating), many people get key things wrong, e.g. treating machine precision as an absolute tolerance, or comparing arrays elementwise rather than in an overall norm. I don’t understand why you seem to think that every testing code should rewrite this basic function over and over again.

For example, for tests of array functions like FFTs where you can often use carefully chosen randomized tests, it’s still important to test things with vector norms (as is done in isapprox) so that you use a tolerance based on the overall scale of the vector. (Elementwise checks, in contrast, tend to randomly fail if one element of the array is much smaller than the others.)

If isapprox(), whatever it does, floats your boat, so to speak,

You’re writing as though isapprox does something mysterious and unknowable? How it does the comparison is precisely documented, with user-controllable tolerances.

I agree that if you have an iterative solver with a convergence criteria is as simple as norm(Δx) ≤ tol*scale, it’s perfectly reasonable to write that explicitly, especially if performance is critical (especially on vectors — isapprox might call a norm or allocate a temporary vector that you can avoid), though this is not a concern in software testing.

I’m not sure what higher-level languages “lack support” for subnormal values these days? It is possible to flush them to zero, but do any languages do this by default? Or are you talking about symbolic/computer-algebra systems that default to arbitrary-precision arithmetic rather than hardware IEEE arithmetic? In any case, Julia certainly supports them fine (and does not flush them by default), as long as you don’t want to trap the underflow exceptions.

But I’m not sure what your point is. Subnormals kick in nearly 300 orders of magnitude below machine epsilon in double precision, so for most calculations (i.e. calculations using most of the 600 orders of magnitude supported in double precision) the threshold for subnormal numbers is irrelevant for approximate comparisons.

(And yes, in many mathematical functions you might want to specially test the corner case of behaving gracefully in underflow/overflow scenarios. It is even more common to want to make sure you don’t accidentally underflow/overflow, a classic example being the naive algorithm sqrt(x^2+y^2) for hypot(x,y). But such tests still involve isapprox or equivalent, with due care about the tolerances.)

9 Likes

Slightly off topic, but welcome Richard! For those who don’t know, he’s a major author of portions of the Maxima computer algebra system and professor emeritus at Berkeley specializing in computing and applied math. Glad to see you contributing to the Julia ecosystem.

13 Likes

No one in this conversation is suggesting people can’t pass context-appropriate tolerances as needed. It’s a question of the best default tolerances. You keep asserting any other default is wrong. I urge you to consider that is a matter of perspective and usage, and that the people using this function with its default arguments probably aren’t the people who are using extremely high and extremely low magnitude numbers.

And with that, I’m out.

1 Like

I should also mention another scenario where I use (i.e. isapprox with default tolerances) all the time, both for my own purposes and for pedagogy: quick checks of formulas and conjectures.

For example, suppose I’ve worked out a nontrivial formula, like \mathrm{vec}(BCA^T) = (A \otimes B) \mathrm{vec}(C), and I want to make sure I didn’t make some simple algebra mistake. The first thing I do is something like B = randn(10,20); C = randn(20,30); A = randn(40,30); vec(B*C*A') ≈ kron(A, B) * vec(C) and look for a true result.

I find it incredibly convenient (and clearer) that I don’t have to write norm(vec(B*C*A') - kron(A, B) * vec(C)) < 1e-8 * norm(kron(A, B) * vec(C)) over and over again for these kinds of things. And manually inspecting the output is a pain, especially for arrays. I don’t need a custom tolerance since I’m not trying to check for floating-point issues, only algebra goofs, and an 8-digit match is usually enough to rule out the latter with high probability, while being insensitive to the former. But I want a relative tolerance because my function might not be scaled to values near 1 (though in this example it is).

PS. is especially quick since the Mac keyboard has a shortcut option-x for it, though you can also tab-complete \approx.

2 Likes

How crazy would it be for this to print a warning on first use? If called with the default tolerance. With an explanation of why this is a bad idea (or a link to one). It’s correct once you think about it, but certainly catches people out.

TIL, thanks!

4 Likes

I’m specifically responding to @Richard_Fateman’s suggestion that people shouldn’t use isapprox (with or without custom tolerances) and should rather just write ad-hoc comparisons from scratch in most cases.

It is precisely because atol depends on context/usage/perspective that we can’t assume a nonzero default.

Float64 has about 600 orders of magnitude available, but even scaling by 1e±6 (which is not very extreme) will render an atol=1e-8 useless or worse. Yes, I think that having a basic mathematical function assume that all numbers are scaled to unit magnitude is wrong. Floating-point numbers are not fixed-point numbers, though many people seem to confuse the two.

(Even if your numbers start out around 1, e.g. if you generate them with randn, they can quickly become much larger or much smaller depending on what functions you pass them through.)

3 Likes

I think it could be reasonable for the @test macro to print a suggestion on an x ≈ 0 test that fails, since it already prints a custom message for failed comparisons: more informative @test x ≈ y error message · Issue #55812 · JuliaLang/julia · GitHub

Probably not the isapprox function itself (which would be a breaking change, and in any case it’s not usual behavior for mathematical functions to print warnings like this).

3 Likes

The first thing I would do is
B = rand(Int,10,20); C = rand(Int,20,30); A = rand(Int,40,30);

1 Like

Aside: one function that I do find myself writing over and over is:

relerr(a, b) = norm(a - b) / max(norm(a), norm(b))

in order to inspect the relative error of some calculation. I wish this were built-in. I suppose I could put it in my startup.jl, but then it will bite me every time I go to a new machine or try to share code snippets with someone else.

2 Likes

Sure, in this particular example the function maps integers to integers. (I think you mean rand(Int, m, n). Luckily, this particular function is also insensitive to integer overflow.)

But that isn’t true of most functions.