I think your “rule” is basically that the compiler cannot increase the error significantly, so by definition any counter-example would violate your “rule”…your argument is basically tautology. My point is that it is incredibly difficult to automate such a “rule” when analyzing arbitrary algebraic transformations, so it cannot be used to implement a working compiler optimization.
(There are academic papers on automating error analysis for compiler optimizations, but they only handle a very restricted class of problems/optimizations.)
In the first case I would then like to propose to put the -ffast-math Fortran benchmark as a separate example on the Julia benchmark’s page.
-ffast-math is unrelated to your “Fortran rules”. No compiler specification provides the guarantees that you are claiming. So, your whole document is irrelevant to that discussion.
As I commented elsewhere, if -ffast-math dramatically speeds up some code, we can ask what algebraic transformation it did that helped. We can then decide whether to do that transformation manually (in all languages).
But even if an algebraic transformation speeds up a benchmark, it doesn’t automatically mean it is worthwhile — the point of the benchmark is to measure how the same computations compare in different languages, not to compute e.g. π²/6 as quickly as possible. Benchmarks, almost by definition, are doing useless calculations to which you already know the answer.
I think that a blog post (or a series) by someone qualified about these issues would really help. Despite reading Goldberg (1991) and a couple of similar papers and book chapters, my standard method for learning about these issues is getting burned by them.
It’s a complicated subject, unfortunately, that is difficult to convey accurately in a few words. I really enjoyed this presentation by Kahan, which is nominally about Java but in fact is mainly about common misconceptions about floating-point arithmetic, with lots of interesting examples.
I think the reason we can’t reach an agreement is that there are a lot of sub-problems that we never agreed upon a clear “yes” or “no”, and the overall subject is very complicated, and impossible to make progress on unless we can at least agree on these simple sub-problems. So I have asked one such sub-problem here: Can the following list of transformations turn a stable well conditioned algorithm into an unstable one?, let’s discuss that one. After we reach a clear agreement on that, we can tackle the rest.
I honestly don’t see the purpose that would be served in spending my time on analyzing this list of transformations. -ffast-math is not guaranteed to be restricted to this list of transformations. Even if it were, it wouldn’t be useful to do this in a benchmark for reasons that have been explained repeatedly — we want all languages to be performing the same arithmetic steps, not just computing the ~same answer (e.g. π²/6 or the 20th Fibonacci number) as quickly as possible.
And you are still misunderstanding basic terminology. An “algorithm” cannot be well-conditioned, for example; this is a property of the problem. And both Kahan summation and naive summation (and pairwise summation) are backwards-stable, and summation is well-conditioned if all inputs are non-negative (regardless of their magnitudes), but Kahan error < pairwise error << naive sum error if the number of inputs is sufficiently large.
And, as has also been pointed out to you repeatedly, one of the transformations in your list is a change of associativity for +. So, it can change e.g. naive summation into pairwise summation, radically changing the error characteristics (but not changing stability or conditioning).
Thanks @stevengj I think we are getting somewhere:
I thought the error is computed using
error = alg(x) - f(x) = e*f’(x)
where f’(x) is the conditioning of the problem (independent of the algorithm) and “e” is the backward error. I thought that backwards-stable means low “e”.
If Kahan, naive and pairwise summations are all backward stable, that is, having small “e”, and since the condition number f’(x) is the same for all of them, how can they have different errors overall, if the only variable here is “e”?
I would think that the naive summation for large inputs has a large “e”, and so it is not stable anymore, but the Kahan summation keeps “e” low, and so it is stable. As shown by the numerous examples discussed above.
P.S. Citing from J. Demmel notes: “We let alg(x) be our algorithm, including all errors from roundoff, truncation error, etc.”, my understanding of it is that e contains the accumulated roundoff error of the large naive summation, but Kahan has a smaller accumulated error, and thus smaller e.
Backwards stable means (essentially) that “e” is O(ε), i.e. the backwards error decreases at least linearly with precision. It says nothing about whether the coefficient is large or small (compared to what?), or how the coefficient scales with the size of the problem. In Kahan summation, the backwards error is O(ε), in pairwise summation the error is O(ε log n), and in naive summation the error is O(ε n); all three are stable.
Also, note that the relevant error for floating-point computations is usually the relative error, for which you need the relative condition number, and the formula is somewhat different than what you wrote (which was for the absolute error).
There are plenty of books that explain all of this; I like the book by Trefethen and Bau, which explains the precise definitions.
Stop a moment to consider that you are asking others to spend time tutoring you on the basics of numerical analysis, but at the same time you are arguing that everyone should accept your opinions on accuracy, despite being repeatedly rejected, unless we take the time to prove you wrong (which is especially hard given how imprecise many of your statements are).
@certik: I 'm just going to step in here and suggest you let this go.
I greatly value the contributions to these forums by some of the world’s foremost experts on numerical analysis, compilers, and language design. But I feel like it’s counterproductive to try engage these experts in prolonged debates after clear warnings that the debate is unwelcome and one-sided in terms of knowledge of the topic. I would rather see such people spending their time productively, improving the language. And I’d hate to see any of them turned off to the forums. So please just take the kindly offered advice to get up to speed on the basics.
To my defense I will say that I have contacted @mbauman before posting here, and I was hoping he could point some basic things in the notes I wrote up if they were wrong, precisely so that I don’t waste your time here. I have discussed the notes with 8 other people, all of them said that the rules seem consistent, but perhaps not complete.
Thank you all of you for the discussion and the provided examples. Thanks @stevengj for the pointer, I will get the book and try to make a head way into this myself and try to use the precise definitions in the rules formulations.
I will not post about this in here anymore, so that I don’t waste your time. It was suggested on the github issue to move the discussion into this forum, so I assumed I can do that and that people who don’t want to discuss (and thus waste time) can ignore it.
Again, I really apologize if anyone was forced to lose time with me. That was absolutely not my intention. Thanks everybody who contributed.
Kahan’s presentation was very educational — thanks for linking it! It made me wonder about the following: he recommends 80-bit (C’s long double) floats for intermediate results. Are they supported natively on modern x86 CPUs in a practical sense? What are the trade-offs? And most importantly, is there a chance of using them in Julia?
That’s quite out-dated advice. 80-bit floats for intermediate computations are actually a disaster for anything other than hand-coded assembly on a system without OS threading. That’s because the compiler or the OS may at any point save registers to the stack and thus drop the additional precision since the float values are saved as 64-bit floats. That can lead to situations where you compare a value at one point and seemingly obvious implications of that comparison fail to hold just a few operations later. There are all sorts of examples possible where this creates head-exploding potential bugs in high-level code.
Stefan, I don’t think that applies to variables that you have explicitly declared to be extended precision (e.g. long double) in compilers that support this type. This is what Kahan was talking about, not implicit extended-precision mode registers for variables declared as double.
That being said, use of extended-precision for accumulation etc. has always been problematic for various reasons, especially portability and performance. Performance-wise, they aren’t as “free” as Kahan implies because of occasions where they might have to be spilled to the stack, as well as the fact that they prevent SIMD vectorization. But his basic point was that this is/was a feature supported by common hardware (fast 80-bit extended precision … much faster than software extended precision, at least), and it is a bit frustrating when a language prevents you from utilizing it.
If the type is explicit and the value is guaranteed not to be truncated at “random” points in the computation, that’s a much better situation. My understanding is the 80-bit floats are a bit of a hardware anachronism these days anyway.
x86 architecture has essentially 2 parallel sets of floating point functionality:
the x87 instructions (e.g. FMUL), which descend from the original 8087 coprocessor. These have 80-bit registers, and the original intention was that all computations would be done at higher precision, truncating only when spilling the registers. The problem is that the results were dependent on register allocation (unless you spill after every operation), so an option was later added to reduce the operational precision to double or single precision. This behaviour was later proscribed by the C99 specification.
the SSE2 and later instructions which defined specific instructions for single and double operations (e.g. MULSS, MULSD), along with SIMD variants (e.g. MULPS, MULPD). These are typically much faster, and also support nice features like pipelining, and in later additions, FMAs.
SSE2+ instructions are basically used by all modern compilers unless you explicitly use long double types (though some recent-ish versions of gcc will still use x87 by default on 32-bit machines).
x87 also defined instructions for certain special functions (e.g. FSIN), though it turns out to be faster (and in some cases, more accurate) to compute these in software. The only exception is the remainder functions (e.g. fmod) which often still use x87 underneath.