BetterExp 0.1.0

Do you use exp? Do you want it to be faster and (slightly) more accurate? If so, use https://github.com/oscardssmith/BetterExp.jl. It adds versions of exp, exp2 and exp10 that are 2x faster than base for Float64, and 3x faster for Float32 than base.

Edit: Hopefully this will make it’s way into Base soon enough, but in the meantime, this will hopefully be useful.

26 Likes

Do you have a reference for which algo this implements versus what Base uses? Or is it the same thing just tweaking the Julia implementation to be faster?

Most of the speed difference is that these algorithms only have 1 branch in the normal case. (The branch is for answers with subnormal, 0, Inf or Nan outputs). This both speeds things up, but also allows for automatic vectorization.
For Float32, the algorithm is pretty similar to base, except it uses a 7 term polynomial instead of a much shorter rational function.
The Float64 version is fairly different. It uses a table with 256 elements, to reduce the range the polynomial kernel needs to approximate. This allows a much smaller polynomial to maintain full accuracy.

11 Likes

The Float64 version also has a bunch of minor tricks to get full precision with minimal use of double-double arithmetic. The two main tricks are storing extra bits in the table, and having the kernel approximate expm1 instead of exp.

1 Like

Can you share how the coefficients for the expm1b_kernel have been chosen? For the ::Val{ℯ} case, I can recognize off-hand that those aren’t quite the Taylor series coefficients, so I’m guessing they’ve been numerically optimized in some way.

(I’ve been silently following the Julia PR, and I got sucked into trying to understand where all the constants were coming from. So far I have not been able to get a minimax optimization to reproduce those specific coefficients, so I’m curious if there’s some extra constraints I haven’t been able to reverse engineer yet.)

2 Likes

The coefficients are minimax polynomials obtained by Remez.jl. The thing that’s a little tricky is that since I want the first coefficient to be 0, I actually approximate x -> x == 0 ? one(x) : expm1(x)/x. Then to make it so the error minimization is absolute error with respect to expm1, you need to weight the error function by a factor of x. Thus the full way to generate the weights (for ::Val{ℯ}) ends up being

ratfn_minimax(x -> x == 0 ? one(x) : expm1(x)/x, (-log(2)/512,log(2)/512), 3, 0,(x,y) -> x)
4 Likes

One thing worth looking into (especially for Float32) is if rounding these coefficients means that the error they generate increases significantly. If so, it might be worth doing an exhaustive search of nearby coefficients to see if any combination gives better results.

Couldn’t find the PR that adds this to Base, can you please paste it here for reference?

1 Like
3 Likes

OK, thank you for the clarification. I see now that given your weight function, that’s a minimax of the absolute error — I’d been playing around with minimaxing the relative error (so my weight function was x -> iszero(x) ? one(x) : x / expm1(x)). I also coded up my own minimax algorithm (as a way to teach myself the principles), so I’m guessing part of the remaining residual difference is just some implementation details on my part.

Do you have a guarantee on the accuracy of the new versions? Do they give exactly the same results as the current implementation?

2 Likes

They don’t give exactly the same results, but the Float64 versions have a maximum error of .76 ULPs compared to .88 ULP for base. For Float32, my versions have a max error of 1 ULP compared to .9 for base.

6 Likes

Thanks. Could you please give a couple of examples of the worst cases?
How did you find them and what did you compare with?

Is it exactly 1 ulp for Float32? This is important to know for directed rounding purposes.

2 Likes

It was 1.008 for exp2, but I’m pushing a new commit that brings it down to .87 max. I somehow was missing a degree for exp2.

4 Likes

Does interval arithmetic provide an easy way to get the maximum error of a polynomial (in ULPs)? If so, how would I do it? I’m thinking of trying an exhaustive search for better polynomials than the rounded minimax polynomials, but exhaustive checking is too slow.

Good question. I don’t think it’s too easy. People use tools like Sollya and Gappa for this I believe.

Can we directly get expm1?
I mean exp(x)-1

sollya is the tool that some people use to get there.
The “onramp” is steep. Here are some resources:

Sollya Users Manual
Sollya Commands

  • look for examples of sollya code

Rigorous Polynomial Approximations and Applications
Rigorous Polynomial Approximations and Applications (slides)

  • sollya has much of what follows built-in
    see the sollya commands fpminimax implementpoly

Optimizing Polynomials for floating point implementation
Exchange algorithm for … error-optimized polynomials

  • also of interest

Accurate Horner Form Approximations to Special Functions

4 Likes

Do you mean get a faster method for expm1(x)? I don’t have one done yet, but plan to. The reason expm1b kernel isn’t good enough is that it’s designed to only be accurate enough when you add 1, which wipes out a lot of the least significant bits (at least 10). Since expm1 has outputs close to 0, it is much harder to approximate within 1 ULP.

1 Like