Storing high precision constants to use with fixed precision libraries (DoubleFloats.jl, Quadmath.jl, DecFP.jl, etc.)

I am hoping to design algorithms that can be used efficiently with other fixed higher precision libraries such as DoubleFloats.jl, Quadmath.jl, DecFP.jl, MultiFloats.jl, etc… This usually involves polynomial approximations to a designated precision (128, 256). Is there a flexible way to store high precision coefficients to be used with all of these libraries?

For example, ideally something like this would work…

function test(x::T) where T
    # strings usually contain 30-50 digits
    P = (parse(T, "1.2"), parse(T, "1.3"), parse(T, "1.4"), parse(T, "1.5"))
    return evalpoly(x, P)

with benchmarks…

julia> @btime test(x) setup=(x=Float128(rand()))
  330.556 ns (0 allocations: 0 bytes)

julia> @btime test(x) setup=(x=Dec128(rand()))
  781.630 ns (0 allocations: 0 bytes)

# allocates for some reason
julia> @btime test(x) setup=(x=Double64(rand()))
  1.258 μs (20 allocations: 848 bytes)

But we are paying the cost to parse the strings. A solution to avoid this is to just hard code the constants in a higher precision…

const P_f128 = (parse(Float128, "1.2"), parse(Float128, "1.3"), parse(Float128, "1.4"), parse(Float128, "1.5"))

function test2(x)
    return evalpoly(x, P_f128)

julia> @btime test2(x) setup=(x=Float128(rand()))
  71.869 ns (0 allocations: 0 bytes)

This approach works well and fast but a little less flexible than I was hoping for as it requires hard coding the constants for a specific package (and take it on as a dependency.) You could also cast any AbstractFloat to whatever internal precision you use but slightly less flexible.

Is there perhaps a good approach to be flexible enough to handle all the different higher precision packages in the current and future ecosystem but also avoid allocations and parsing of string constants during runtime?

If the numerator and denominator are small enough you could use rationals:

function test(x::T) where T
    P = T.((12//10, 13//10, 14//10, 15//10))
    return evalpoly(x, P)

The nice thing is that in many cases the compiler can do the conversion at compile time (as of JuliaLang/julia#41258).

1 Like

Thank you! There are times where we know the exact coefficients so this works well (as long as the Integer fits nicely with Int128). I like the idea using rationals, but sometimes terms look like this…

function test(x)
    return evalpoly(x, (1990919576929585163761247434580873723897677550573731281207275390625, -1094071724893410272201314846735800345180930752601602224440504638671875, 103359845691236916270005420886293061281368263471845448279855946394531250, -3993558675585831919973605001813276532326292885934464373884268722794718750, 83832389176247515253189196480455786065824463879054932596114704927571390625, -1101977288759423115916484463959767795703317649005495798822550381533967842875, 9865413224996821913093061266347613112205666995211786544733850490740903131000, -63509799534443193918054738326913581450607131662586232327271654588312820660616, 305080179205137176095342856930260393560550505786421683990077008259370104309410, -1122099959965657526344757894103766710826629020929459670948834078441128579791750, 3217185258543282976637373169047731813093578126603884759856059695249999306047500, -7276835259402589085458442196686990645669654847254510877738804991391058411412500, 13076651508858985557227319801010895818335803028865299751194038965212274849406250, -18719023753854332443081892087572754119280558462994056787647908433841920235468750, 21307327225910629020600045595173860611314592374884901403059587980149276271875000, -19156728115567236234371593788102013666866274144599851347429312225076676478125000, 13430107219321888006291381503763318985372222835071804983658005207838642173828125, -7186150593017474773099947110903785965879266917528290917207712073663895771484375, 2834031566467842832851805860262261718160136985649155528263587796394390332031250, -776308737033643856044315139790308583914612827783322139063211433790569824218750, 131897976398031751060811874321878385924211075364673046295410087596954345703125, -10468093364923154846096180501736379835254847251164527483762705364837646484375)) / 5456052820246562178600318839637653652351064473600000000

so we get the promotion to BigInt first. Of course, this is when our coefficients are known exactly but higher order terms usually require at least Int256 to store…

So I guess there are really two instances I’m looking at. 1. Where we know the higher order coefficients exactly as integers (usually fit within Int256) and 2. Where we just know the minimax floating point number in high precision (usually a Float128 number).

In this case you could write it as T(significand) * exp2(T(exponent)), using an Int128 significand, at the cost of an exp2 call.