BetterExp 0.1.0

Oscar_Smith · August 11, 2020, 3:49pm

Do you use exp? Do you want it to be faster and (slightly) more accurate? If so, use GitHub - oscardssmith/BetterExp.jl: A faster and more accurate Exp (and Exp2, Exp10) functions for Julia.. It adds versions of exp, exp2 and exp10 that are 2x faster than base for Float64, and 3x faster for Float32 than base.

Edit: Hopefully this will make it’s way into Base soon enough, but in the meantime, this will hopefully be useful.

tbeason · August 11, 2020, 3:53pm

Do you have a reference for which algo this implements versus what Base uses? Or is it the same thing just tweaking the Julia implementation to be faster?

Oscar_Smith · August 11, 2020, 3:59pm

Most of the speed difference is that these algorithms only have 1 branch in the normal case. (The branch is for answers with subnormal, 0, Inf or Nan outputs). This both speeds things up, but also allows for automatic vectorization.
For Float32, the algorithm is pretty similar to base, except it uses a 7 term polynomial instead of a much shorter rational function.
The Float64 version is fairly different. It uses a table with 256 elements, to reduce the range the polynomial kernel needs to approximate. This allows a much smaller polynomial to maintain full accuracy.

Oscar_Smith · August 11, 2020, 4:03pm

The Float64 version also has a bunch of minor tricks to get full precision with minimal use of double-double arithmetic. The two main tricks are storing extra bits in the table, and having the kernel approximate expm1 instead of exp.

jmert · August 11, 2020, 4:13pm

Can you share how the coefficients for the expm1b_kernel have been chosen? For the ::Val{ℯ} case, I can recognize off-hand that those aren’t quite the Taylor series coefficients, so I’m guessing they’ve been numerically optimized in some way.

(I’ve been silently following the Julia PR, and I got sucked into trying to understand where all the constants were coming from. So far I have not been able to get a minimax optimization to reproduce those specific coefficients, so I’m curious if there’s some extra constraints I haven’t been able to reverse engineer yet.)

Oscar_Smith · August 11, 2020, 4:18pm

The coefficients are minimax polynomials obtained by Remez.jl. The thing that’s a little tricky is that since I want the first coefficient to be 0, I actually approximate x -> x == 0 ? one(x) : expm1(x)/x. Then to make it so the error minimization is absolute error with respect to expm1, you need to weight the error function by a factor of x. Thus the full way to generate the weights (for ::Val{ℯ}) ends up being

ratfn_minimax(x -> x == 0 ? one(x) : expm1(x)/x, (-log(2)/512,log(2)/512), 3, 0,(x,y) -> x)

Oscar_Smith · August 11, 2020, 4:22pm

One thing worth looking into (especially for Float32) is if rounding these coefficients means that the error they generate increases significantly. If so, it might be worth doing an exhaustive search of nearby coefficients to see if any combination gives better results.

Datseris · August 11, 2020, 4:25pm

Couldn’t find the PR that adds this to Base, can you please paste it here for reference?

jmert · August 11, 2020, 4:26pm

https://github.com/JuliaLang/julia/pull/36761

jmert · August 11, 2020, 4:35pm

OK, thank you for the clarification. I see now that given your weight function, that’s a minimax of the absolute error — I’d been playing around with minimaxing the relative error (so my weight function was x -> iszero(x) ? one(x) : x / expm1(x)). I also coded up my own minimax algorithm (as a way to teach myself the principles), so I’m guessing part of the remaining residual difference is just some implementation details on my part.

dpsanders · August 11, 2020, 5:52pm

Do you have a guarantee on the accuracy of the new versions? Do they give exactly the same results as the current implementation?

Oscar_Smith · August 11, 2020, 5:57pm

They don’t give exactly the same results, but the Float64 versions have a maximum error of .76 ULPs compared to .88 ULP for base. For Float32, my versions have a max error of 1 ULP compared to .9 for base.

dpsanders · August 11, 2020, 6:14pm

Thanks. Could you please give a couple of examples of the worst cases?
How did you find them and what did you compare with?

dpsanders · August 11, 2020, 6:15pm

Is it exactly 1 ulp for Float32? This is important to know for directed rounding purposes.

Oscar_Smith · August 11, 2020, 6:28pm

It was 1.008 for exp2, but I’m pushing a new commit that brings it down to .87 max. I somehow was missing a degree for exp2.

Oscar_Smith · August 11, 2020, 10:12pm

Does interval arithmetic provide an easy way to get the maximum error of a polynomial (in ULPs)? If so, how would I do it? I’m thinking of trying an exhaustive search for better polynomials than the rounded minimax polynomials, but exhaustive checking is too slow.

dpsanders · August 11, 2020, 10:15pm

Good question. I don’t think it’s too easy. People use tools like Sollya and Gappa for this I believe.

Juan · August 12, 2020, 12:41am

Can we directly get expm1?
I mean exp(x)-1

JeffreySarnoff · August 12, 2020, 12:48am

sollya is the tool that some people use to get there.
The “onramp” is steep. Here are some resources:

Sollya Users Manual
Sollya Commands

look for examples of sollya code

Rigorous Polynomial Approximations and Applications
Rigorous Polynomial Approximations and Applications (slides)

sollya has much of what follows built-in
see the sollya commands fpminimax implementpoly

Optimizing Polynomials for floating point implementation
Exchange algorithm for … error-optimized polynomials

also of interest

Accurate Horner Form Approximations to Special Functions

Oscar_Smith · August 12, 2020, 2:21am

Do you mean get a faster method for expm1(x)? I don’t have one done yet, but plan to. The reason expm1b kernel isn’t good enough is that it’s designed to only be accurate enough when you add 1, which wipes out a lot of the least significant bits (at least 10). Since expm1 has outputs close to 0, it is much harder to approximate within 1 ULP.

Topic		Replies	Views
Is there a smarter way to evaluate an float power? Performance	27	1640	February 15, 2023
Slow arbitrary base exponentiation, a^b Performance	22	2776	June 14, 2020
A faster pow(x,1/12) function available in Fast12thRoot.jl Numerics	12	2493	June 3, 2019
What's going on with exp() and --math-mode=fast? General Usage fast-math	29	4429	October 23, 2021
Rounding the coefficients of a minimax polynomial General Usage polynomials	19	648	January 14, 2023

BetterExp 0.1.0

Related topics