Get length of a fully represented float

HenriDeh · February 21, 2024, 3:05pm

Hello,

I must work with a dataset of float numerical entries. A few of them (like 0.1%) are really small and thus are written in e-notation (e.g. 1.87589e-17). I’m passing this dataset to a C program via Julia and apparently it does not recognize 1.87589e-17 as a numerical value ( ). Therefore I’d like to convert these numbers to their fully expanded format. So 0.000000000000000187589 (didn’t count the actual zeros).

To do this I can use @printf "0.*f" 22 1.87589e-17. The problem is that 22 is not always the right length for the expansion. So I would like to know if there is a formula that will return the number of digits after the decimal (22 here) given a number ?

I know I could pick a large length and apply to all numbers, but this would greatly increase the size of the data files.

Sukera · February 21, 2024, 3:12pm

Unfortunately not in general - numbers with recurring digits like 1/9 won’t stop.

mbauman · February 21, 2024, 3:16pm

Yeah, but this is in the context of a parseable floating point number. Here’s a simple thing that should be sufficient:

ndigits(v) = max(0, ceil(Int, -log10(eps(v))))

HenriDeh · February 21, 2024, 3:25pm

This looks like this is giving an upper bound, no?
ndigits(1.) gives 16 and ndigits(1.8757e-17) gives 33.

mikmoore · February 21, 2024, 3:31pm

There is no floating point number equal to 1//9. Every IEEE754 floating point number has an finite decimal representation because every finite power of 2 has a finite decimal representation.

julia> big(1/9)
0.111111111111111104943205418749130330979824066162109375

julia> big(1/9) |> nextfloat # notice all these "spare" digits
0.1111111111111111049432054187491303309798240661621093750000000000000000000000011

Although many fewer digits than this are necessary to resolve the float uniquely. That numbers is roughly given by the ndigits function suggested above (maybe add 1 to be safe? I haven’t though hard about it).

mbauman · February 21, 2024, 3:33pm

Yes, being smarter is harder

In general, the process of efficiently determining the minimal number of decimal digits required to exactly represent a particular binary floating point number is a hard problem on which many academic papers have been published (see grisu, ryū; Julia itself uses the latter).

HenriDeh · February 21, 2024, 3:35pm

Oh okay, I thought this would be a trivial problem. I’ll settle for your formula then. Thank you.

mbauman · February 21, 2024, 4:14pm

I’ve edited my answer to do slightly better for large numbers and more specifically target the behavior of the %0.*f format.

If you at all have control over that C program, it’d be so much simpler if you could change that parsing behavior to sscanf a %g format instead.

HenriDeh · February 21, 2024, 4:31pm

I could clone the repo, edit, then make a custom jll, but it’s not worth the trouble. What you proposed worked just fine.

Elrod · February 21, 2024, 5:59pm

You could reinterpret(UInt64, x), then cast back on the C side.

Dan · February 21, 2024, 6:54pm

You can also take the long and dirty route of manipulating strings.
Trigger warning - following isn’t pretty:

function deexponent(fp)
    s = string(fp)
    m = match(r"([^e]*)e([\-0-9]*)",s)
    isnothing(m) && return s
    if s[1] == '-'
        neg = true
        s = s[2:end]
    else
        neg = false
    end
    e = parse(Int, m.captures[2])
    m2 = match(r"([0-9]*).([0-9]*)", s)
    if isnothing(m2)
        bd = m.captures[1]
    else
        bd, ad = m2.captures[1], m2.captures[2]
    end
    blen = length(bd)
    alen = length(ad)
    c = blen + e
    if c < 0
        res = "0."*"0"^(-c) * bd * ad
    elseif c > alen+blen
        res = bd * ad * "0"^(c-alen-blen) * ".0"
    else
        res = (bd*ad)[1:c] * "." * (bd*ad)[c+1:end]
    end
    neg ? "-"*res : res
end

With this patchy function:

julia> deexponent(1.87589e-17)
"0.0000000000000000187589"

julia> 0.0000000000000000187589
1.87589e-17

And other examples work as well.
This method might have string bugs, but doesn’t have log10 bugs.

mbauman · February 21, 2024, 7:13pm

I would suspect that may have rounding issues — changing powers of 10 can change where the values round since 10 isn’t a power of 2. So a decimal that is the shortest representation at one particular power of 10 isn’t necessarily going to round the same way at another.

rafael.guerra · February 21, 2024, 7:42pm

Try also code below using significant digits and based on this other post.

using Printf

function full_float_signif(x::Float64, sigdig::Int)
    (x == 0) && (return (1, "0"))
    x = round(x, sigdigits=sigdig)
    n = length(@sprintf("%d", abs(x)))              # length of the integer part
    if (x ≤ -1 || x ≥ 1)
        decimals = max(sigdig - n, 0)               # 'sig - n' decimals needed 
    else
        Nzeros = ceil(Int, -log10(abs(x))) - 1      # No. zeros after decimal point before first number
        decimals = sigdig + Nzeros
    end
    s = @sprintf("%.*f", decimals, x)
    return length(s), s
end

# Example:
full_float_signif(1.87589e-17, 6)    # (24, "0.0000000000000000187589")

Topic		Replies	Views
BigFloat converted to string returns different number? Numerics numbers , bigfloat	8	557	August 28, 2022
How to show a floating point number without truncating, when the number can be expressed with a finite number of digits? General Usage	5	331	July 29, 2023
Get the decimal part of a number General Usage numbers	7	4153	January 15, 2023
Converting Float32 values back to Float64 without introducing nonzero digits at the end? General Usage	2	559	May 3, 2024
Printing exact decimal representations Numerics float , bigfloat	2	471	August 30, 2023

Get length of a fully represented float

Related topics