Dear community,
Recently, I’ve been trying to debug an arbitrary precision algorithms/formulae. I stumbled upon to the fact that it is by default users are allowed to have mixed precision input without any warnings, e.g.
big(1.3)
1.3000000000000000444089209850062616169452667236328125
gives you a BigFloat instance of inexact const “1.3”
but still
julia> big(1.3)
1.3000000000000000444089209850062616169452667236328125
julia> BigFloat(1.3)
1.3000000000000000444089209850062616169452667236328125
I also tried to overload widen, still no luck
What is the minimal set of definitions to guarantee every promotion Float64 to BigFloat is caught?
I’m fairly certain the original poster is aware of why1.3 has limited precision and wants to find a way to turn all promotions and constructors from double precision to BigFloat into errors, in order to detect where their code has made incorrect assumptions.
I think big and BigFloat are not promotion but use the constructor directly. I’m not sure if this is possible but could you try to overload the constructor or the struct directly? (This should be possibly in Julia 1.12 beta). I would not suggest changing this since it could create unpredictable problems, but I’m not sure there is another way. Do you actually need this? Most of the codes usually use AbstractFloat to keep things generic. Just to be sure we are not going through the XY problem, could you share what are you actually doing?
Simplest case: I’m converting the expression for the LDA DFT energy functional, which is
E_x[LDA] = -3/4cbrt(3/pi)*\int_\rho^4/3 dr
my density is given on a grid of BigFloat, an integral is converted into sum on a Gaussian quadrature. The question is how to ensure that numerical constant in front of the expression matches the precision of integration.
Among various possible expressions this gives you enhanced accuracy:
-3//4cbrt(big(3)/pi)…
while, for example
-3//4cbrt(3/pi)…
is not. Now imagine you’d like to convert many such expressions for different functionals. I don’t want even a single chance of loosening accuracy on the way.
Just to be a pedantic grumpy old man, the conversion from Float64 to Big is perfectly exact.
1.3is exactly equal to 1.3000000000000000444089209850062616169452667236328125. It’s a bit of a mouthful, but it’s precisely 5854679515581645//4503599627370496. The fact that it can’t hit 13/10ths is unfortunate, but the inexact part happened before you got to a BigFloat promotion. It happened before you got to Float64, in fact!
So then the question is: how do you prevent Julia from giving you the default Float64? I think the most common place for Float64s to result from integer inputs would be in /. And so I’d just shadow that to increase the precision as you’d prefer:
julia> x / y = Base.:/(big(x), big(y))
/ (generic function with 1 method)
julia> 2/3
0.6666666666666666666666666666666666666666666666666666666666666666666666666666695
Or you could make it spit out rationals for integer inputs:
julia> x::Integer / y::Integer = big(x) // big(y)
/ (generic function with 2 methods)
julia> 2/3
2//3
Note that this is just shadowing the builtin definitions for / and it’ll only apply in your current module. This is far better than pirating promotion (which would affect everything).
There are a few other places where you’ll find integers yielding Float64s — like in negative powers and roots. But those can similarly be shadowed.
I appreciate your determination, but I insist that the displayed value of 1.3 after 16th decimal is a garbage taken from the stack/heap/whatever by mpfr library, these digits vary depending on your hardware.
If so, that’s a bug. Can you give an example of different platforms producing different results here? Might be due to some of the global state configuration that MPFR sadly relies on, or due to the width of the floating-point exponent? Or perhaps it’s due to Intel’s 80-bit FP?
Apart from that, seems like you misunderstood @mbauman. Note that he put 1.3 into a code block when writing that, drawing a distinction between the actual rational number, 1.3 = \frac{13}{10}, representable as, e.g., 13 // 10 in Julia, and 1.3::Float64.
Compare:
julia> setprecision(500) do
BigFloat(13 // 10)
end
1.2999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999998
julia> setprecision(500) do
BigFloat(13 / 10)
end
1.3000000000000000444089209850062616169452667236328125
It does not, it’s just the exact value of the closest Float64 representation of 1.3. If we add up the powers of 2 for each bit, we get the same digits minus the implicit leading 1.
julia> big(1.3)
1.3000000000000000444089209850062616169452667236328125
julia> sum(parse(Int, b)*(big(2.0)^e) for (e, b) in zip(-1:-1:-52, bitstring(1.3)[13:64]))
0.3000000000000000444089209850062616169452667236328125
If arbitrary precision floats were really just taking garbage bits (which would be crazy for such a widespread dependency), it’d be a miracle for a sum of 52 of them to give the same digits as a 53rd, or for the digits to be so consistent across multiple instances on possibly different processes on different machines. Float64 and any binary numeric type can only be exact for sums of powers of 2, a strict subset of rational numbers. 1.3 = 1 + 3/10 is not in that subset, so it must be approximated. If we check the smaller adjacent Float64 value, it is even farther from the unrepresentable 1.3, and the adjacent values differ by the least significant bit.
Float64 only prints 1.3 because it truncates more (note that BigFloat also truncates, a big tell is when the digits don’t end with a 5 as all negative powers of 2 do); it still has the same value as the BigFloat conversion:
julia> 1.3 == big(1.3)
true
In fact, if we lower the global instantiation precision of BigFloat to that of Float64, we get the same level of truncation:
julia> setprecision(53) do
parse(BigFloat, "1.3")
end
1.3
If rounding in printing isn’t your style, then the only real way out of the inexactness of binary representations (sums of powers of 2) of decimal numbers is to parse text to decimal representations (sums of powers of 2 and 5). Support isn’t too great, the decimal floating point DecFP.jl isn’t active, the arbitrary precision Decimals.jl admits it’s not as efficient as the more typical libmpdec wrappers, then there’s the aptly named FixedPointDecimals.jl . No idea how good these are in practice, I just got used to rounding and printing a long time ago and only tried Decimals a couple times before giving up because of the incomplete support.
Is indeed an exact representation of Float64 of a constant 1.3. big copies its mantissa into 256bit (default) representation bit-by-bit (53 bits) and set the rest to 0b. This is the place where I got confused: I thought, an initial mantissa of BigFloat can be arbitrary. No, it is wrong, it is always set to 0b initially. No hardware dependencies.
Unfortunately, this is not what I wanted – a proper floating point representation of a rational 13//10, which is 1.2(9)
To track such unexpected conversions, I’d like to convert all such implicit conversions into errors.
And did redefining Base.BigFloat(::AbstractFloat) or creating a local BigFloat constructor fully solve the issue? You did click ‘Solution’ on that post, but the later discussion and summary made this a bit unclear again.
In my opinion, you’re disallowing Float64 → BigFloat in the wrong place and with a misleading message. As you summarized, the conversion from one binary numeral to a higher precision binary numeral is actually exact, so error("Imprecise.") or the more typical InexactError is incorrect. What you don’t want is people using the first binary numeral’s lower precision in the first place (converted to 200+ zeroed bits) in your algorithm, you want them to use an exact representation (string, Rational) or parse/convert that to a higher precision numeral for your algorithm. That suggests the check belongs in a preceding or preliminary step of your algorithm, not globally.
If you disallow the exact conversion globally, then you also stop your users from using their code or any other package that needs it. You might have even broken something in or that will be in Base. This conversion exists for good reasons:
There are many decimal values that are exactly represented in binary, and you just forced users to convert them away from Float64 just for conversion to BigFloat for more precise operations. big.(zeros(3)) no longer works, we need to do parse.(BigFloat, string.(zeros(3))).
Many algorithms start with the lower precision, do an operation in a higher precision, then return to the lower precision. The larger initial approximation error is just part of the intent. Again, you just forced a lot of extra work on that.
Assuming you’re just doing this in your algorithm, another issue is that you’re also only addressing Float64. There’s the even less precise Float16 and Float32 in Base, and there can be an arbitrary number of lower precision AbstractFloat subtypes you don’t want as well as an arbitrary number of higher precision AbstractFloat subtypes you do want (including BigFloat → BigFloat). You can’t disallow an infinite subset of input types explicitly, so you should try specifying what input types you do allow. At this point, the inputs could be Union{AbstractString, Rational, BigFloat}, though you will probably just use BigFloat in the core algorithm because strings can’t support any arithmetic and Rational will automatically convert to Float64 in many operations that usually don’t have rational outputs e.g. asin(1//2).