A faster pow(x,1/12) function available in Fast12thRoot.jl


New to Julia, so hopefully I’ve chosen the correct category for posting this.
Actuaries use x^(1/12) very often in order to convert annualized numbers to their monthly compounded equivalent. I had an itch that had to be scratched and that was to see if there was a faster calculation than that underlying the power function. Using Remez.jl I was able to find a rational polynomial to x^(1/12)-1 on the interval [0.5,1.0) resulting in a x^(1/12) approximation that appears to be both fast and accurate, even compared to cbrt(sqrt(sqrt(x))).
I’m an actuary rather than a numerics expert so not 100% sure that I’ve missed something obvious.
Therefore looking for someone to review/comment if possible.




This is another rational polynomial approximation to (x^(1//12) - 1) for x in 0.5:1.0.

ratpoly_numerator_coeffs =
	[ 0.1407373980457154e-3,
          0.24598051459861557e-2 ]

ratpoly_denominator_coeffs =
	[ 0.22355385157150939e-3,
          0.16406842123662818e-2 ]


That’s awesome!

You mentioned that you amended Remez.jl: care to share what you did? I haven’t looked at it for a while, but pull requests are welcome.



Hi Jeffrey,
Thanks for taking a look.
I believe that my initial coefficients already resulted in an absolute error < 0.5eps in [0.0, 1.0].
There is an accuracy.jl script included in the package which buckets errors vs big(x)^(one(BigFloat)/big(12)) for a specified interval and number of random numbers.
I’m also using a lower degree poly in the denominator so will be more performant.

I’m obviously missing something but I don’t get the stated accuracy using the code snippet you included.
For example for x=0.9090909090909091, your code gives an abs err = 2.2640350239970713e-16, and fast12throot gives 4.35889747467582e-18.
From what I can tell your hornerpoly function is for x^12 in [0.5,1] rather than x^12-1. Is that correct?
I had found that the Float64 coeffs for x(1/12)-1 were more accurate than those for x^(1/12).

horner_numerator(x) = (0.0001407373980457154 + (0.010909742736020892 + (0.14823153004176548 + (0.6117674724014694 + (0.9022845211608465 + (0.4782192944649025   + (0.07872193791007243 + 0.0024598051459861557 * x) * x) * x) * x) * x) * x) * x)
horner_denominator(x) = (0.00022355385157150939 + (0.014313304914450771 +  (0.17395352028835623 + (0.656615590039731 + (0.8906672552104548 +  (0.43174893535587344 + (0.06357219738630518 + 0.0016406842123662818 * x) * x) * x) * x) * x) * x) * x)
hornerpoly(x) = horner_numerator(x) / horner_denominator(x)

for x in linspace(0.5, 1.0, 100)
    v0 = big(x)^(one(BigFloat)/big(12))
    v1 = fast12throot(x)
    println(x, ": ", Float64(v0-hornerpoly(x)), ", ", Float64(v0-fast12throot(x)))




Hi Simon,

I think the changes are too minor for a pull request - I was just hacking around in my search for an accurate approximation. But I felt it was the right thing to do a hat-tip to your packages.

I included code to halt after a certain number of iterations since the max error would bounce between BigFloats and never terminate.
I also included the calculation of maximum error for the equivalent rational polynomial but with the BigFloat coefficients converted to Float64.
In addition I also validated results obtained against another Remez program in C.

High-level methodology:
It doesn’t seem possible to have an accurate, lowish-degree rational polynomial approx over [1/4096,1].
The interval [0.5,1.0] is convenient for fp calculations, and the function x^(1/12) is well-behaved on that interval.
Can’t use Newton iteration with a low-degree poly because will lose accuracy when calculating x^11 or x^12 unless you use double-doubles. But that will have too high a performance impact.
So require an accurate poly over the interval when using Float64 arithmetic only - hence the inclusion in Remez.jl.
Targeting x^(1/12) directly results in loss of accuracy after converting coeffs to Float64 (and rescaling), so look for a simple transformation that does give required accuracy but will remain performant.
x^(1/12)-1 seems to fit the bill with the benefit of being exact when reversing the transformation if we re-order the calculation a little and use the double-double components of 2^(ii-1)/12 obtained using your DoubleDouble.jl package.




( egad – almost never copy and paste incorrectly : ) I will take down the specifics.

Your work is done well. With the understanding that you may know all this already, let me share three things I noticed while playing around with minimaxs.

It is usually helpful to run Remez routines using bounds that are fractionally wider than the target domain (widening into negative values may be unhelpful, depending on the function). Given two approximations with the same number of coeffs, if one has large and small coeffs and the other has more consisitent values, usually the second will behave no worse – and may be smoother in low order bits for some subsections of the domain.

Some of the approximations that had one more coeff in the numer than the denom or vice-versa, while still accurate, tended to either slightly overestimate or slightly underestimate many values. I cannot recall if that occured with the x^(1//12) or the x^(1//12)-1 approx.

Thank you for the elegant and careful work.



A floating point division introduces numerical errors, so while a rational polynomial is very good for managing approximation error, its not necessarily optimal for total error.

So (other things being equal) f(x) = 1+\frac{p_0(x)}{p_1(x)} is preferable to f(x) = \frac{p(x)}{q(x)} because the rounding errors in the division are generally below the precision of the final result; so @MrBelette’s choice of approximating x^{1/12} - 1 with a rational polynomial is unsurprisingly empirically good.

Of course, other things are rarely equal, and @JeffreySarnoff’s comments are generally pertinent.

Anyway, my suggestion is

# rational polynomial approximation of $x^{1/12}$ on interval $[0.5,1]$.
# numerator coeff
cn0 = 0x1.428f959fef75ep+3
cn1 = 0x1.e5d031fe80f2ap+4
cn2 = 0x1.13c2898e2b3d2p+5
cn3 = 0x1.22165286503bep+4
cn4 = 0x1.145459c617496p+2
cn5 = 0x1.776903a6bfeddp-2
cn6 = 0x1.9be55757541bcp-9
# denominator coeff
cd0 = 0x1.e3d7606fe730ep+6
cd1 = 0x1.a3ccd3365dddep+8
cd2 = 0x1.1d50673006c9cp+9
cd3 = 0x1.7e00bc991e4ebp+8
cd4 = 0x1.02699220bc0b4p+7
cd5 = 0x1.3f6801fd01ef6p+4
cd6 = 0x1.0p+0

# so that intermediate rational division result is 0 when t=0
t = x - 1
# calculate approximation
nn = (((((cn6 * t + cn5) * t + cn4) * t + cn3) * t + cn2) * t + cn1) * t + cn0;
# dd = (((((cd6 * t + cd5) * t + cd4) * t + cd3) * t + cd2) * t + cd1) * t + cd0;
dd = (((((t + cd5) * t + cd4) * t + cd3) * t + cd2) * t + cd1) * t + cd0;
ratpoly = nn / dd;
return ratpoly * t + 1;


  1. It uses a transform to help manage the zero in the approximation.
  2. It uses 1 as multiplicative identity to save a multiplication.
  3. It balances the order of numerator and denominator to maximise execution overlap in pipeline.

This is not proven to be faithfully rounded, but it seems reasonably good.