A faster pow(x,1/12) function available in Fast12thRoot.jl

New to Julia, so hopefully I’ve chosen the correct category for posting this.
Actuaries use x^(1/12) very often in order to convert annualized numbers to their monthly compounded equivalent. I had an itch that had to be scratched and that was to see if there was a faster calculation than that underlying the power function. Using Remez.jl I was able to find a rational polynomial to x^(1/12)-1 on the interval [0.5,1.0) resulting in a x^(1/12) approximation that appears to be both fast and accurate, even compared to cbrt(sqrt(sqrt(x))).
I’m an actuary rather than a numerics expert so not 100% sure that I’ve missed something obvious.
Therefore looking for someone to review/comment if possible.




This is another rational polynomial approximation to (x^(1//12) - 1) for x in 0.5:1.0.

ratpoly_numerator_coeffs =
	[ 0.1407373980457154e-3,
          0.24598051459861557e-2 ]

ratpoly_denominator_coeffs =
	[ 0.22355385157150939e-3,
          0.16406842123662818e-2 ]

That’s awesome!

You mentioned that you amended Remez.jl: care to share what you did? I haven’t looked at it for a while, but pull requests are welcome.

Hi Jeffrey,
Thanks for taking a look.
I believe that my initial coefficients already resulted in an absolute error < 0.5eps in [0.0, 1.0].
There is an accuracy.jl script included in the package which buckets errors vs big(x)^(one(BigFloat)/big(12)) for a specified interval and number of random numbers.
I’m also using a lower degree poly in the denominator so will be more performant.

I’m obviously missing something but I don’t get the stated accuracy using the code snippet you included.
For example for x=0.9090909090909091, your code gives an abs err = 2.2640350239970713e-16, and fast12throot gives 4.35889747467582e-18.
From what I can tell your hornerpoly function is for x^12 in [0.5,1] rather than x^12-1. Is that correct?
I had found that the Float64 coeffs for x(1/12)-1 were more accurate than those for x^(1/12).

horner_numerator(x) = (0.0001407373980457154 + (0.010909742736020892 + (0.14823153004176548 + (0.6117674724014694 + (0.9022845211608465 + (0.4782192944649025   + (0.07872193791007243 + 0.0024598051459861557 * x) * x) * x) * x) * x) * x) * x)
horner_denominator(x) = (0.00022355385157150939 + (0.014313304914450771 +  (0.17395352028835623 + (0.656615590039731 + (0.8906672552104548 +  (0.43174893535587344 + (0.06357219738630518 + 0.0016406842123662818 * x) * x) * x) * x) * x) * x) * x)
hornerpoly(x) = horner_numerator(x) / horner_denominator(x)

for x in linspace(0.5, 1.0, 100)
    v0 = big(x)^(one(BigFloat)/big(12))
    v1 = fast12throot(x)
    println(x, ": ", Float64(v0-hornerpoly(x)), ", ", Float64(v0-fast12throot(x)))


Hi Simon,

I think the changes are too minor for a pull request - I was just hacking around in my search for an accurate approximation. But I felt it was the right thing to do a hat-tip to your packages.

I included code to halt after a certain number of iterations since the max error would bounce between BigFloats and never terminate.
I also included the calculation of maximum error for the equivalent rational polynomial but with the BigFloat coefficients converted to Float64.
In addition I also validated results obtained against another Remez program in C.

High-level methodology:
It doesn’t seem possible to have an accurate, lowish-degree rational polynomial approx over [1/4096,1].
The interval [0.5,1.0] is convenient for fp calculations, and the function x^(1/12) is well-behaved on that interval.
Can’t use Newton iteration with a low-degree poly because will lose accuracy when calculating x^11 or x^12 unless you use double-doubles. But that will have too high a performance impact.
So require an accurate poly over the interval when using Float64 arithmetic only - hence the inclusion in Remez.jl.
Targeting x^(1/12) directly results in loss of accuracy after converting coeffs to Float64 (and rescaling), so look for a simple transformation that does give required accuracy but will remain performant.
x^(1/12)-1 seems to fit the bill with the benefit of being exact when reversing the transformation if we re-order the calculation a little and use the double-double components of 2^(ii-1)/12 obtained using your DoubleDouble.jl package.


( egad – almost never copy and paste incorrectly : ) I will take down the specifics.

Your work is done well. With the understanding that you may know all this already, let me share three things I noticed while playing around with minimaxs.

It is usually helpful to run Remez routines using bounds that are fractionally wider than the target domain (widening into negative values may be unhelpful, depending on the function). Given two approximations with the same number of coeffs, if one has large and small coeffs and the other has more consisitent values, usually the second will behave no worse – and may be smoother in low order bits for some subsections of the domain.

Some of the approximations that had one more coeff in the numer than the denom or vice-versa, while still accurate, tended to either slightly overestimate or slightly underestimate many values. I cannot recall if that occured with the x^(1//12) or the x^(1//12)-1 approx.

Thank you for the elegant and careful work.

A floating point division introduces numerical errors, so while a rational polynomial is very good for managing approximation error, its not necessarily optimal for total error.

So (other things being equal) f(x) = 1+\frac{p_0(x)}{p_1(x)} is preferable to f(x) = \frac{p(x)}{q(x)} because the rounding errors in the division are generally below the precision of the final result; so @MrBelette’s choice of approximating x^{1/12} - 1 with a rational polynomial is unsurprisingly empirically good.

Of course, other things are rarely equal, and @JeffreySarnoff’s comments are generally pertinent.

Anyway, my suggestion is

# rational polynomial approximation of $x^{1/12}$ on interval $[0.5,1]$.
# numerator coeff
cn0 = 0x1.428f959fef75ep+3
cn1 = 0x1.e5d031fe80f2ap+4
cn2 = 0x1.13c2898e2b3d2p+5
cn3 = 0x1.22165286503bep+4
cn4 = 0x1.145459c617496p+2
cn5 = 0x1.776903a6bfeddp-2
cn6 = 0x1.9be55757541bcp-9
# denominator coeff
cd0 = 0x1.e3d7606fe730ep+6
cd1 = 0x1.a3ccd3365dddep+8
cd2 = 0x1.1d50673006c9cp+9
cd3 = 0x1.7e00bc991e4ebp+8
cd4 = 0x1.02699220bc0b4p+7
cd5 = 0x1.3f6801fd01ef6p+4
cd6 = 0x1.0p+0

# so that intermediate rational division result is 0 when t=0
t = x - 1
# calculate approximation
nn = (((((cn6 * t + cn5) * t + cn4) * t + cn3) * t + cn2) * t + cn1) * t + cn0;
# dd = (((((cd6 * t + cd5) * t + cd4) * t + cd3) * t + cd2) * t + cd1) * t + cd0;
dd = (((((t + cd5) * t + cd4) * t + cd3) * t + cd2) * t + cd1) * t + cd0;
ratpoly = nn / dd;
return ratpoly * t + 1;


  1. It uses a transform to help manage the zero in the approximation.
  2. It uses 1 as multiplicative identity to save a multiplication.
  3. It balances the order of numerator and denominator to maximise execution overlap in pipeline.

This is not proven to be faithfully rounded, but it seems reasonably good.

My earlier coefficients were simply the high precision coefficients obtained via Remez, however these are suboptimal.

Using the nifty Sollya tool, its possible to obtain better coefficients:

  • the high precision coefficients have an error of about 3.2e-19
  • the round to nearest coefficients have an error of about 8.4e-17
  • the coefficients selected by the LLL algorithm have an error of about 4.1e-19

(Note these calculations are still performed in extended/high precision and I derived them in Sollya separately for numerator and denominator, so they are still not guaranteed to be optimal.)

Here are the coefficients from Sollya:

# numerator coefficients
cn0 = 0x1.428f959fef7d6p3
cn1 = 0x1.e5d031fe80fc9p4
cn2 = 0x1.13c2898e2b419p5
cn3 = 0x1.22165286503b4p4
cn4 = 0x1.145459c617129p2
cn5 = 0x1.776903a6ba6ebp-2
cn6 = 0x1.9be55755b0744p-9
# denominator coefficients
cd0 = 0x1.e3d7606fe73c1p6
cd1 = 0x1.a3ccd3365de6ap8
cd2 = 0x1.1d50673006ceep9
cd3 = 0x1.7e00bc991e544p8
cd4 = 0x1.02699220bc0e1p7
cd5 = 0x1.3f6801fd01f31p4
cd6 = 0x1.0p0

Enjoy !

PS I have no association with Sollya (www.sollya.org), apart from being a novice user.

1 Like

Hi Jon,
You don’t know how tickled I am that you have taken time to look into this - almost a year after I posted!

If you have the inclination, would you be able to investigate the accuracy of your proposal?
If x = 0.505104511607126 then I think your method has an absolute error of 2.95e-16, so > 0.5ulp, whereas my method has an error of 3.8e-17 (< 0.5ulp).

Indeed, if I scan random numbers in [0.5,1.0] then your method doesn’t appear to be as accurate (97-98% < 0.5ulp vs 100% for mine) - and I was looking for a method that had comparable accuracy to the standard library so that it could be considered as a replacement.

I’m sure I’ve missed something, since this sort of thing isn’t my day job, and I haven’t look at this for a long time.
How should your method be incorporated into my code to avoid errors?

Thanks again,

1 Like

@MrBelette ,

Further analysis suggests my earlier proposal is seriously flawed.

In part due to issues that Jeffrey raised about instability with certain sets of coefficients, and also because I didn’t pay attention to being clear about what was required.

  • Is the desired interval [\frac{1}{2},1] or [\frac{1}{2},1) ?
    The semi-open interval does not need the special case of t=0, so its perhaps not sensible to use transform t=x-1 for the best approximation.
  • The interval [\frac{1}{2},1) is best for simple range/strength reduction, but is it best for simple polynomial approximation ? (I conjecture [\frac{3}{4},\frac{3}{2}) might be numerically more stable, but will be more expensive for range reduction.)
  • What is the actual required precision ?
    I was targeting faithful rounding (which is basically \pm1 ulp) but you seem to be looking for correct rounding (100% \pm\frac{1}{2} ulp). However, without resolving the Table Makers Dilemma for this function (which I am not volunteering to do) I don’t know how much intermediate precision is required to guarantee correctness.

But If I go back and look at the text of your question, your use case suggests you code will end up a line that looks like fast12throot(x)-1, (see for example converting annual interest rate to monthly when compounding frequency known) so I wonder whether what you really want is the function fast12throotm1(x)\equivfast12throot(x)-1. It is often undesirable to calculate (x+1)-1 for small x because it loses precision on the intermediate step, which is exactly why standard maths libraries include functions for log1p(x)\equiv\log(x)+1 and expm1(x)\equiv\exp(x)-1 specifically to avoid problems with precision near special cases (since the implementations include approximations then add 1 or subtract 1 respectively).

So I suppose there are two questions I should have asked earlier:

  1. Do you really want both functions ?
  2. What is actually the acceptable error for the desired function(s) ?



Judicious use of @fastwrongmath macro.

The @fastwrongmath macro sometimes speeds up Floating Point calculations by using tricks that are technically not correct. It is of course trivial to get a wrong answer quickly — for example just return 41, when we all know the correct answer is 42 but getting the right question/calculation is much harder.

In particular it would break your code in _fast12throot, but it should be fine applied specifically to r12_kernel_num and to r12_kernel_den. (The errors introduced are not a problem for polynomials derived directly from the Remez algorithm using arbitrary precision, but this is obviously a very limited subset of all calculations.)

But it may not make any difference without using -O3 in Julia command line.


PS For some reason the Julia team left out the word wrong when they defined the macro, which misleads people into thinking it just makes things faster, so you will have to leave it out when using it, but don’t forget it makes things faster by making them wrong.

1 Like

Haha! Well I agree it’s an unfortunate name but it’s also standard. And I also wonder whether this might be a bit of a mischaracterization; it’s true that it encourages the compiler to emit code which is wrong-er. But even without it, getting what you want from standard floating point arithmetic can be subtle and difficult at times (ie, wrong).