Is there a smarter way to evaluate an float power?

  1. Currently set as a function input, so no global variable trickery! :blush:

  2. That is a good point, will look into that. When I plot rho0 * rh^(1/7), it seems to be symmetric around y, which is a really good speed up, since rh = 1 ± small number. Then of course subtracting rho0 again after is just a constant subtraction. Seems to me it should work :slight_smile:

Kind regards

Yes, x^{1/7} = \sqrt[7]{x}, and any function f(x) = \sqrt[n]{x} is odd for odd n, i.e. it is symmetric around x=0. So the optimization is mathematically correct.

3 Likes

Would you mind sharing your discoveries with us ? I would be interested in helping for such a task. A way to generate automatically an optimized approximation of a univariate function on a given range, with a given approx level / overall error, would be neat.

If type agnostic, it could allow developing higher precision versions of some functions that we still lack.

I assume you’re referring to “developing some packages for automatically implementing the polynomial-based kernel functions”?

There’s currently two packages on Gitlab.

  • This package is finished functionality-wise, but not registered yet because I want to do some reorganization of the code in my packages
  • There’s no README yet, but this is basically just some linear programming formulations for use with MathOptInterface-compatible linear optimizers (Tulip is the only one that’s tested). It’s sort of a replacement for a Remez algorithm (like implemented by Sollya and Remez.jl), but with LP instead of Remez. In some ways LP is more flexible, but the downside (theoretical, at least) is that it doesn’t necessarily return the exact minimax polynomial. In practice, I think these theoretical advantages of a Remez algorithm don’t matter at all, because of this: if the polynomial found by the linear optimization isn’t good enough, it’s always possible to re-run the optimization with some random perturbation, and also higher resolution if necessary. It’s possible Remez would be faster, though, I don’t know.
  • This package is far from finished
  • The README is quite out of date, it’s probably best not to read it, it would just be confusing
  • There’s quite some code that I haven’t pushed to the repo yet, basically I’m doing a rewrite of significant parts of the package
  • What this currently does:
    1. Uses Tulip and PolynomialPassingThroughIntervals to find a close-to-minimax polynomial that approximates some function (its coefficients in the standard basis, to be exact)
    2. Constructs a “preparation” of the polynomial in a couple different flavors. A “preparation” is my ad-hoc term for a representation of the polynomial that is more amenable for software implementation of the polynomial’s evaluation. There’s these, for example:
      1. “Horner” preparation that obviously uses Horner’s scheme to evaluate the polynomial
      2. “compressed” preparation that does a couple of well-known tricks like transforming an odd polynomial into a denser form using a change of variables like y := x^2, and then evaluating a polynomial in y, and factoring out the stray x
      3. “factored” preparation that factors the polynomial, for evaluating the factored form, which can be faster because of less data dependencies enabling more out-of-order execution and other machine-level effects like that, I think. The idea is to evaluate the factors so the multiplication expression would have minimal nesting (in the shape of a min-height binary tree).
      4. “translated” preparation: it vertically translates the polynomial. This is useful because it makes the “factored” preparation more accurate. A problem is that it’s necessary to find a good translation constant, and I haven’t yet been able to think of a smart way to look for a good translation constant.
    3. Tries to find good in-the-neighborhood coefficients of the given rounded coefficients of the polynomial. Sollya uses some smart lattice-reduction technique (interpreting the problem as a closest-vector problem) for this, I just try random displacements. The goal here is to improve the evaluation accuracy compared to just using the naively-rounded coefficients.
    4. Generate polynomial evaluation code in different programming languages. I think the currently working languages are Julia, C++, Rust, Java, Go. This is actually very easy because before this step I generate “programs” composed of assignments, similarly like in static-single-assignment form (SSA), so I never need to generate complicated expressions for any language.
  • What are some of the things I want to do before registering the package:
    1. Implement automatic generation of double-word floating-point code (“double-double”) for evaluating a polynomial (in one of the above preparation flavors), more accurately than with the usual single-word arithmetic
    2. Optimizations (especially for the double-word floating-point code), including what I termed “domain-aware optimization”: if a coefficient or an arithmetic operation in the generated code doesn’t contribute to the final result, eliminate it. The specific algorithm is supposed to work something like this:
      1. in an expression like a + b or fma(a, b, c), try replacing one of the arguments with an identity element (e.g. zero or one).
      2. if the change doesn’t affect the result for any of the chosen points of the domain (ten thousand randomly-chosen points, for example), it’s OK to make the replacement of the arguments with an identity element permanent
      3. now repeat, and do some dead code elimination, too
    3. I’d like to also do some optimizations on a bit higher level, for example for a +, * or fma operation where some arguments are double-word numbers.
    4. Put Julia’s LLVM JIT to good use by moving some relatively-rare computations to the type-domain. This could (I think) significantly speed up such tasks as:
      1. Looking for a good translation constant for a translated preparation.
      2. Finding good in-the-neighborhood coefficients for implementing the polynomials
2 Likes

This is particularly interesting for me. As you clearly know better than me what is happening in your work, would you say that the “generic” approximation you are constructing would be possible in a type-agnostic way ?

I mean, there are special functions right now that are implemented by such approximations, and these are usually fine-tuned for a specific number type (e.g. Float64) and unusable for higher-precision stuff (e.g. with my favorite high precision package MultiFloats.jl). Would your work allow producing such approximation for any given type “automatically” ?

More precisely, would your work be able to deal with the production of special functions implementations for high-precision arithmetics ?

1 Like

In principle, the minimax polynomial can be found up to an arbitrary accuracy, at least with Remez. I believe my LP formulations are also good for this purpose. Once one has the minimax polynomial, I suppose it’s possible to implement the evaluation in, e.g., Horner’s scheme for any type, as long as the type supports addition and multiplication. The code I generate currently also uses FMA operations, though. There’s already an issue about providing the option of codegen that wouldn’t require FMA - this would be easy to implement for the usual single-word arithmetic, but somewhat more laborious for a multi-word arithmetic (which aren’t implemented yet anyway).

A MultiFloat{T, n} is a multi-word (n-word) arithmetic, so when I implement the codegen with double-word arithmetic, this would definitely be useful for MultiFloat{T, 2}. In principle I guess I could also have codegen for MultiFloat{T, n} with arbitrary n, but that’s seems more more involved to implement. Both of these options would presumably be an improvement in both performance and accuracy over the naive approach of the previous paragraph applied to MultiFloat{T, n}.

Given the above, I’d say yes. But note that, although I think what I’m doing is awesome, I’m not an expert, and this is just a toy project. And, in case it’s not clear, all of this is just for producing the kernel functions, applying math knowledge manually will still be necessary for implementing the range reduction.

Note that you can do a lot better than just using the MultiFloat * and + for a polynomial evaluation because often you can compute the higher order terms in lower precision.

Yeah, I hope that the “domain-aware” optimization, mentioned above, will be able to turn one into the other.