Handling large values correctly

vh94 · July 27, 2024, 3:45am

I repeatedly run into issues handling large values (values beyond 1e+15), and couldn’t find a clear answer on how to address those.

Consider following comparison:

(1.0e+16 + 1 + 1 + 1) > 1.0e+16

will wrongfully return false, however i can mitigate this issue by using either

(1.0e+16 + (1+1+1) )  > 1.0e+16

or

(BigInt(1.0e+16) + 1 + 1 + 1) > 1.0e+16

both returning true - the correct answer. Which hints that types as well as bracets have influence on the rounding, even if they should be mathematically equivalent.

The same logic applies to following modulo calculations:

(BigInt(1e+16) + 1) % 2 # is 1 - correct !
BigInt(1e+16 + 1) % 2 # is not 1 - false !

In those simple examples I can address those issues, however applying the same type hints ( BigInt or BigFloat) in slightly more complex operations (for example Tupper’s self-referential formula here) I fail to obtain the correct solution:

tupper(x,y) = 0.5 < floor(floor( BigFloat(y) / 17) * 2.0^( -17 * floor(BigFloat(x))-(BigFloat(y) % 17)) % 2) ? 1 : 0 

k = BigInt(960939379918958884971672962127852754715004339660129306651505519271702802395266424689642842174350718121267153782770623355993237280874144307891325963941337723487857735749823926629715517173716995165232890538221612403238855866184013235585136048828693337902491454229288667081096184496091705183454067827731551705405381627380967602565625016981482083418783163849115590225610003652351370343874461848378737238198224849863465033159410054974700593138339226497249461751545728366702369745461014655997933798537483143786841806593422227898388722980000748404719)
tupper(0,k) # returns 0 but should be 1

I assume I call BigFloat at the wrong place, leading to a wrong modulo result. How is the correct version of the tupper function in julia, and where can I learn how to use Arbitrary Precision Arithmetic correctly in functions?

thank you!

Benny · July 27, 2024, 6:42am

What you’re seeing is the often raised issue of limited precision in floating point. To oversimplify it a little, it’s like having limited decimal places after the most significant digit; if you only have 3 digits to work with, you can add 1 to 10.0 just fine, but when you get to 1000, that 1 is just going to be considered a 0 because the resolution is now 10. Same with floating point, but you have binary bits instead of decimal digits; Float64 in particular has 53 precision bits, so that’s floor(1+log10(2^53)) == 16.0 decimal digits (but not all values). 10^16 uses up 17 decimal places, so the corresponding Float64 literal 1e16 cannot have a resolution of 1. You can use prevfloat, nextfloat, and eps on a particular value to see the resolution, e.g. eps(1e16) == 2.0.

Floating points have many values where the resolution is >1, the limited precision is just accepted as a tradeoff of performance, and people operate on values that are close enough to each other and the output. A counterpart to your example of an operation on a large output and a small output is catastrophic cancellation, when an operation on 2 large outputs results in a very inaccurate small output. Algorithms that work on large collections of values will often choose an order to reduce such errors if possible.

Those aren’t type hints, you are converting values to a different type, and operations on them and a less precise type are implemented to promote to the same higher precision type before the operation. So, BigInt(1e16+1) converts the output of 1e16+1, which we know to be 1e16 due to insufficient precision, while BigInt(1e16)+1 converts the floating point to arbitrary precision integer, then the addition operator promotes the 1 to arbitrary precision integer before addition.

Let me try to break down the tupper(0, k) call because k is too large to mentally compute. k%17 == 0 so k is divisible by 17 to make a big integer. Ignoring the BigFloat conversions for now, that makes y % 17 == 0 and x == 0, so to try to simplify the expression:

0.5 < floor(floor(big_integer) * 2.0^( -17 * floor(0)-(0)) % 2) ? 1 : 0
0.5 < floor(big_integer * 2.0^( -17 * 0-0) % 2) ? 1 : 0
0.5 < floor(big_integer * 2.0^0 % 2) ? 1 : 0
0.5 < floor(big_integer * 1 % 2) ? 1 : 0
0.5 < floor(big_integer % 2) ? 1 : 0

Checking div(k, 17), it is odd (ends in 7), so

0.5 < floor(1) ? 1 : 0
0.5 < 1 ? 1 : 0
true ? 1 : 0
1

so yeah the underlying math checks out, and if you remove the BigFloat conversion and now unnecessary floors, you get the right answer

julia> tupper2(x::Int,y::BigInt#=divisible by 17=#) = 0.5 < (div(y, 17) * 2^( -17 * x - (y % 17)) % 2) ? 1 : 0
tupper2 (generic function with 1 method)

julia> tupper2(0, k)
1

If we check floor(1+log2(k)), then we see it has 1804 precision bits. Thing is, unlike BigInt’s instance-wise adaptation, BigFloat has a global setting for its precision. If we check mine:

julia> precision(BigFloat)
256

Oh that wasn’t nearly enough. So lets give it a bit more:

julia> setprecision(BigFloat, 1804) do
         tupper(0, k)
       end
1

For whatever reason, you get the correct answer with >=1799 bits as well, but I’m not capable of interrogating the precision issues with the actual values.

If you need to work with integers, just work with BigInt instead of repeatedly flooring and adjusting the precision for BigFloats. Just check that those integers are divisible by whatever you need (tupper2 could use a y%17 check).

sgaure · July 27, 2024, 6:50am

Typically, you should let the inputs be big, not convert inside the formula unless necessary. Also, BigInt will always use the necessary precision, whereas BigFloat has a fixed precision which you should make big enough.

using Plots

function tupper_(x,y)
    1/2 < floor(mod(floor(y/17) * 2^(-17*floor(x) - mod(floor(y), 17)), 2))
end

tupper(x,y) = setprecision(BigFloat, 4096) do 
    tupper_(x, BigFloat(y))
end

k = big"960939379918958884971672962127852754715004339660129306651505519271702802395266424689642842174350718121267153782770623355993237280874144307891325963941337723487857735749823926629715517173716995165232890538221612403238855866184013235585136048828693337902491454229288667081096184496091705183454067827731551705405381627380967602565625016981482083418783163849115590225610003652351370343874461848378737238198224849863465033159410054974700593138339226497249461751545728366702369745461014655997933798537483143786841806593422227898388722980000748404719"

A = [tupper(x,y) for x in 105:-1:0, y in k:k+16]';

spy(A, xshowaxis=false, yshowaxis=false)

stevengj · July 27, 2024, 12:29pm

While it works in this particular case, beware that it may not produce the integer you want, e.g.:

julia> BigInt(1e30)
1000000000000000019884624838656

This is because a floating-point literal like 1e30 is first rounded to the nearest Float64 value, and because this is binary floating point it generally only represents 53-bit integers (the precision) times powers of two. Since

10^{30} = 5^{30} 2^{30} = 931322574615478515625 \times 2^{30}

and 931322574615478515625 > 2^{69} is bigger than the largest 53-bit integer, it is first rounded to the closest representable value 931322574615478534144 = 3552713678800501 \times 2^{18}, which when multiplied by 2^{30} gives the value of BigInt(1e30) above.

Moral: if you want to use BigInt or BigFloat, don’t generally start with an ordinary Float64 literal (which may already be rounded), and instead create the bignum value directly, e.g.

julia> BigInt(10)^30
1000000000000000000000000000000

julia> big"1e30" # BigFloat value, exact in the default ≈ 77 digit precision
1.0e+30

julia> precision(BigFloat, base=10) # the current BigFloat precision in decimal (approximately)
77

(Except for the syntax, all of this is not specific to Julia. It is just how floating-point arithmetic works. If you want to represent decimal values exactly up to the available precision, you can use decimal floating-point types ala DecFP.jl.)

Benny · July 27, 2024, 9:27pm

Probably worth pointing out about literals too is that the BigInt call in k = BigInt(9609...) is redundant for that value. Depending on the magnitude, integer literals are parsed as Int (Int32 on 32-bit systems or Int64 on 64-bit systems) or higher precision types like Int64, Int128, or BigInt. Floating point literals on the other hand do a specific type (exponent syntax uses e for Float64, f for Float32) and throw a ParseError if the magnitude is too large.

This parsing discrepancy is why BigInt can’t take a string as input like BigFloat and needs the big"" string literal or parse on strings. If you can’t write a literal value into the code e.g. reading text from a file, then you’ll need to process strings for very high precision values, and as always be mindful of the types across the operations and whether they can represent the necessary range of values.

Topic		Replies	Views
A question regarding BigFloat General Usage question	11	1858	November 5, 2018
Unexpected result about integer General Usage	13	1344	August 17, 2018
Problem with rounding with big numbers General Usage	2	554	December 6, 2022
BigInt precision issue General Usage	3	781	October 14, 2018
Disable promotion from Float64 to BigFloat General Usage	20	325	June 2, 2025

Handling large values correctly

Related topics