Poll: speed vs accuracy for `Float64^-3`

StefanKarpinski · March 9, 2022, 4:01pm

The Float64 number 0.001 is exactly 0.001000000000000000020816681711721685132943093776702880859375 which is why when you ask to print a certain number of digits of it, you get non-zero digits after a point.

It may be interesting to note that since 2 divides 10 every binary floating point value can be written in decimal with a finite number of non-zero digits. If you have n bits after the “decimal” point of your value, then it will take at most n decimal digits after the decimal point to print the number in decimal.

DNF · March 9, 2022, 4:53pm

For this it’s better to use floating point literals, 1e-3, 1e-6, etc. And actually directly writing 124.2e-3 is the most common usage I’ve seen.

josuagrw · March 9, 2022, 5:12pm

Is this documented anywhere? Maybe it should be…

Oscar_Smith · March 9, 2022, 5:37pm

I’m not sure if we document that explicitly anywhere. We arguably should.

lmiq · March 9, 2022, 5:47pm

If so, probably add a comment on what other languages do. Otherwise that can be an additional source of confusion.

giordano · March 9, 2022, 5:59pm

Documenting it would be nice (see for example what GCC does), but, to be fair, I’m not sure any language provides correctly rounded math libraries by default.

mikmoore · March 10, 2022, 12:12am

I don’t think we really have any realistic choice but to ensure that literal_pow produced identical results to general ^. If it’s the case that we can’t ensure this for any powers beyond [-2,-1,0,1,2] then so be it.

Among the options presented for x^-3 is inv(x)*inv(x)*inv(x). But what about inv(float(x)*float(x)*float(x))? At the very least, it would behave for some popular literal powers like 10^-3. Still, if it doesn’t match non-literal x^y then we’re risking all sorts of chaos.

Oscar_Smith · March 10, 2022, 12:22am

the problem with inv(float(x)*float(x)*float(x)) is that it has much worse overflow behavior.

mikmoore · March 10, 2022, 12:58am

I don’t think the overflow behavior is “much worse”. However, it does produce zeros when the correct result is subnormal: 1e105^-3.0 == 1.0e-315 but inv(1e105^3.0) == 0.0, so is worse and in any case doesn’t work.

Oscar_Smith · March 10, 2022, 6:08am

Thanks all! We’re going with the accurate version.

Palli · March 31, 2022, 12:08pm

What do these numbers mean in practice? Is this operation likely to be the last step in calculations, or early on and inaccuracy accumulates on top?

If the IEEE standard disallows the less accurate version, then it ties our hands, but then Julia is already in a larger violation… Since you had a poll, it seemed like either was arguable.

It seems to me the faster approach is tolerable, since more accurate than what Julia does already, so can it be done as an improvement, and defer going all the way?

Then, if it’s later decided to go for the more accurate version as the default, then people will have to accept following Julia version slower… That might be good, that people will see actual real-world code getting slower, to make an informed decision on rewriting. Whatever you decide the non-default alternative needs to be clear and I would prefer it documented in:

help?> ^

Palli · March 31, 2022, 12:39pm

The quote (from the IEEE standard) is actually longer:

However, if a language decides to implement these functions [e.g. pow], the standard requires the following: A conforming function shall return results correctly rounded for the applicable rounding direction for all operands in its domain. The preferred quantum is language-defined.

I’m not up-to-speed on the last sentence (i.e. “quantum”), if it changes things (the emphasis was actually in the original, so I’m curious, and think that last sentence might be informing), but reading up to it, it seems our hands are tied.

See also (I didn’t read, might be relevant):
https://grouper.ieee.org/groups/1788//email/msg06576.html

The context for C++ seems intriguing (does in NOT require IEEE conformance? or the writer simply misinformed?):

The standard that actually matters is the C++ standard. Regardless of which one you look into (be it the old one, C++11, or C++14), you will find that it actually never states anything about the required precision of operations such as exp(), log(), pow(), and many others. Nothing at all. (At least to the best of my knowledge.) So, technically, the compiler is still “correct”, we cannot claim that it violates the C++ standard here.

However, there is another standard that should apply here: the international standard ISO/IEC/IEEE 60559:2011, previously known as the standard IEEE 754-2008. This is the standard for the floating point arithmetic. What does it say about the situation at hand?

[Note both IEEE 754-2008 and the 2011 update are outdated, and we of course want to conform to latest IEEE 754-2019, which may be more stringent on correctly rounded?]

Or rather DID C++ not require IEEE conformance (or commonly wasn’t implemented by compilers/standard libraries)? The text above seems outdated:

I assume you mean C++23 (not yet a standard), and If/since previous C++ didn’t correctly round, do you know why they didn’t just fix that instead of adding new functions? I guess because of this speed dilemma, and only having those fulfills IEEE, even though most will not use or know of… I suppose we could and should have those functions, or alternatively as default and then in addition: exp_fast and sqrt_fast.

It seems we can do better for a^b, not (just) with literal_pow, but a variant where both a AND b are constants. Would that satisfy most (assuming we go with the fast version otherwise, when only b is a constant, or neither is)?

C++ seems to not even require IEEE for types, single and double, only most commonly implemented that way (and I guess then only for basic arithmetic): std::numeric_limits<T>::is_iec559 - cppreference.com

Well, BigFloat wouldn’t cut it, the most correctly rounded to any precision would be still not be exact, unless… you use decimal types not included in Julia:

The 2008 revision extended the previous standard where it was necessary, added decimal arithmetic and formats, tightened up certain areas of the original standard which were left undefined, and merged in IEEE 854 (the radix-independent floating-point standard).

Yes, the decimal format would work (already implemented in packages, for the finite IEEE form, also a “bigfloat” variant). While this is now part of IEEE 754, it’s not really under discussion here, only the binary format.

Oscar_Smith · March 31, 2022, 2:50pm

One thing that is very non-intuitive about error propagation is that it doesn’t obey any nice rules about widening. As a simple example, consider cos(x+y) with a correctly rounded implementation of cos. Here we have 2 operations both with .5 ulp of error. However,

julia> Float64(cos(big(2^53)+1))
0.4287904318447045
julia> cos(2^53+1)
-0.5285117844130887

To make things even weirder, having large error in the inputs of a function doesn’t guarantee large errors on the output.

julia> x=rand(1000)*2^52;
julia> y2 = [cos(i+j) for i in x for j in x];
julia> y1 = [cos(big(i)+j) for i in x for j in x];
julia> mean(abs.(y1.-y2))
0.0790677001864059
julia> mean(y1.+1)
0.9996932796343756
julia> mean(y2.+1)
0.9991374006407004

In this case, the average error of y2 is .07, but taking the mean of the elements+1, only has error of 0.0005

To compute correctly rounded functions, pretty much the only option is to compute the result to roughly twice the needed precision and then throw away the extra when you return the result. This makes your computations roughly 2x slower, without an obvious gain in accuracy since errors don’t follow nice rules when propagating.

In short, for almost all use-cases the difference in accuracy of your program will not in general get noticeably better by using correctly rounded functions. Either you have a stable problem where your solution will be accurate either way, or you have an unstable problem where the solution will be inaccurate either way.

StefanKarpinski · March 31, 2022, 3:02pm

This post should maybe be added to some numerical precision FAQ. Great explanation and the observation at the end about stable versus unstable problems (algorithms, really), is very on point and something that is really hard for people to internalize.

Palli · March 31, 2022, 3:55pm

My understanding is that since/when the (absolute) value is so high, cos (sin etc.) are basically random-number generators (John Gustafson’s words); thus can’t be “correctly rounded”(?). Unlike for cospi (and sinpi, the reason for those variants):

julia> Float64(cospi(big(2^53)+1))
-1.0

julia> cospi(2^53+1)
-1.0

Note, here the same, despite that, doesn't make the resulting value "correct":
julia> Float64(cos(big(2^53)))
-0.5285117844130887

julia> cos(2^53)
-0.5285117844130887

Oscar_Smith · March 31, 2022, 4:01pm

You are partially correct. cos for large inputs is a correctly rounded random number generator. Correctly rounded has a very specific definition which is that f(x) == T(f(big(x)) for an infinite precision BigFloat. As such, it is totally possible for cos to be correctly rounded, that just isn’t necessarily a useful property for your function to have (which is why Julia doesn’t guarentee it).

Topic		Replies	Views
Weird floating point under/overflow behavior General Usage	3	242	March 31, 2024
Should `literal_pow` optimize for bigger exponents? Internals & Design	15	785	February 10, 2022
Multiplication issue New to Julia question	9	374	May 17, 2023
Different results between 10^50 and 10.0^50 New to Julia	10	295	April 25, 2025
Integer power of large Integer returning very wrong results General Usage base	7	519	May 15, 2023

Poll: speed vs accuracy for `Float64^-3`

Related topics