Is there a smarter way to evaluate an float power?

nsajko · February 15, 2023, 2:11am

I assume you’re referring to “developing some packages for automatically implementing the polynomial-based kernel functions”?

There’s currently two packages on Gitlab.

This package is finished functionality-wise, but not registered yet because I want to do some reorganization of the code in my packages
There’s no README yet, but this is basically just some linear programming formulations for use with MathOptInterface-compatible linear optimizers (Tulip is the only one that’s tested). It’s sort of a replacement for a Remez algorithm (like implemented by Sollya and Remez.jl), but with LP instead of Remez. In some ways LP is more flexible, but the downside (theoretical, at least) is that it doesn’t necessarily return the exact minimax polynomial. In practice, I think these theoretical advantages of a Remez algorithm don’t matter at all, because of this: if the polynomial found by the linear optimization isn’t good enough, it’s always possible to re-run the optimization with some random perturbation, and also higher resolution if necessary. It’s possible Remez would be faster, though, I don’t know.

This package is far from finished
The README is quite out of date, it’s probably best not to read it, it would just be confusing
There’s quite some code that I haven’t pushed to the repo yet, basically I’m doing a rewrite of significant parts of the package
What this currently does:
1. Uses Tulip and PolynomialPassingThroughIntervals to find a close-to-minimax polynomial that approximates some function (its coefficients in the standard basis, to be exact)
2. Constructs a “preparation” of the polynomial in a couple different flavors. A “preparation” is my ad-hoc term for a representation of the polynomial that is more amenable for software implementation of the polynomial’s evaluation. There’s these, for example:
  1. “Horner” preparation that obviously uses Horner’s scheme to evaluate the polynomial
  2. “compressed” preparation that does a couple of well-known tricks like transforming an odd polynomial into a denser form using a change of variables like y := x^2, and then evaluating a polynomial in y, and factoring out the stray x
  3. “factored” preparation that factors the polynomial, for evaluating the factored form, which can be faster because of less data dependencies enabling more out-of-order execution and other machine-level effects like that, I think. The idea is to evaluate the factors so the multiplication expression would have minimal nesting (in the shape of a min-height binary tree).
  4. “translated” preparation: it vertically translates the polynomial. This is useful because it makes the “factored” preparation more accurate. A problem is that it’s necessary to find a good translation constant, and I haven’t yet been able to think of a smart way to look for a good translation constant.
3. Tries to find good in-the-neighborhood coefficients of the given rounded coefficients of the polynomial. Sollya uses some smart lattice-reduction technique (interpreting the problem as a closest-vector problem) for this, I just try random displacements. The goal here is to improve the evaluation accuracy compared to just using the naively-rounded coefficients.
4. Generate polynomial evaluation code in different programming languages. I think the currently working languages are Julia, C++, Rust, Java, Go. This is actually very easy because before this step I generate “programs” composed of assignments, similarly like in static-single-assignment form (SSA), so I never need to generate complicated expressions for any language.
What are some of the things I want to do before registering the package:
1. Implement automatic generation of double-word floating-point code (“double-double”) for evaluating a polynomial (in one of the above preparation flavors), more accurately than with the usual single-word arithmetic
2. Optimizations (especially for the double-word floating-point code), including what I termed “domain-aware optimization”: if a coefficient or an arithmetic operation in the generated code doesn’t contribute to the final result, eliminate it. The specific algorithm is supposed to work something like this:
  1. in an expression like a + b or fma(a, b, c), try replacing one of the arguments with an identity element (e.g. zero or one).
  2. if the change doesn’t affect the result for any of the chosen points of the domain (ten thousand randomly-chosen points, for example), it’s OK to make the replacement of the arguments with an identity element permanent
  3. now repeat, and do some dead code elimination, too
3. I’d like to also do some optimizations on a bit higher level, for example for a +, * or fma operation where some arguments are double-word numbers.
4. Put Julia’s LLVM JIT to good use by moving some relatively-rare computations to the type-domain. This could (I think) significantly speed up such tasks as:
  1. Looking for a good translation constant for a translated preparation.
  2. Finding good in-the-neighborhood coefficients for implementing the polynomials

Topic		Replies	Views
BetterExp 0.1.0 Package Announcements math	30	2666	September 9, 2020
A faster pow(x,1/12) function available in Fast12thRoot.jl Numerics	12	2545	June 3, 2019
Any suggestions for speeding this simple scalar function up? Performance performance	36	2408	August 22, 2021
Why are calls to this function slow, compared to Python+Numpy? General Usage fast-math	14	2168	August 13, 2017
Evaluate integral on many points (Cubature.jl ?) Numerics	44	4812	July 26, 2017

Is there a smarter way to evaluate an float power?

Related topics