I have a family of functions y(x) defined implicitly by
x = \log\left(\sum_{i=1}^N \exp(a_i + y b_i) \right)
where 0 < b_1 < \dots < b_N can be assumed.
It is easy to find the solution numerically (eg with Newton’s method). But I need to evaluate it a gazillion times for a given set of a_i, b_i, so I want to speed up the calculation. Currently it is taking up 70% of my runtime
It can be shown (I will spare you the algebra) that y(x) is increasing and strictly concave. It also converges to asymptotes on both ends, which can be characterized using a_1, b_1, a_N, b_N in a simple way. Here is a mockup illustration:
The idea is that for a particular set of a's and b's, I would solve for y(x) on some grid, approximate that, and from then on look up values from a quick approximation. The approximation should be AD friendly but that should not be difficult.
If I had \lim_{x \to \pm\infty} y(x) = 0 or some constant I would know how to approximate it with Chebyshev polynomials (appropriately transformed to \mathbb{R}). But the asymptotes are giving me a hard time.
It would be great to have a nice (= cheap, smooth, …) function g(x) that fits the asymptotes, then I would approximate y(x) - g(x).
I am leaning to approximating the intercepts with a modified squareplus. Specifically, consider
f(x; A, b, C) = \frac{C}{2}\left(x + A/C + \sqrt{(x + A/C)^2+b^2}\right)
where C \ne 0. It is easy to show that \lim_{x\to-\infty} f(x) = 0 (just like squareplus) and \lim_{x\to\infty} f(x) - Cx - A = 0 (generalizing the slope and intercept).
Then, hoping that I did the algebra right,
-f(-x; A_1, b, C_1) + f(x; A_2, b, C_2)
will have intercept and slope A_1, C_1 at -\infty and A_2, C_2 at \infty. I just need to think may way thought the numerics to see if there are any problems, but I don’t think I can find anything cheaper than this.
(I also considerd rotating a hyperbola to have two oblique asymptotes but I kind of gave up on the algebra of doing that. )
For a smooth function, you can get exponential convergence on a finite interval by evaluating on a Chebyshev grid and then interpolating with a polynomial, ala ApproxFun.jl or FastChebInterp.jl (as in the above post).
But I guess you have already thought about Chebyshev polynomials?
Just subtract off the asymptotes and approximate the what’s left? Oh, I see that you already thought of this:
FastChebInterp.jl should be AD-friendly (it works with ForwardDiff, and it also hooks into ChainRules.jl to expose its own optimized derivatives to AD).
My problem is that I do not know the region I will be evaluating at in advance so I need the whole real line. Does ApproxFun.jl / FastChebInterp.jl allow infinite domains? I am only familiar with the latter and AFAIK it works on hypercubes.
(If I had a finite interval the problem would be trivial.)
If your function goes exponentially quickly to zero (once you have subtracted off the asymptotics), as I think is the case here, you can can perform a change of variables to map it to a finite interval and it will still converge quickly.
For example, if you let x = t / (1-t^2), mapping t \in (-1,1) to x \in (-\infty, +\infty), then for something that decays exponentially fast you are mapping it to a kind of bump function with essential singularities at the endpoints, and Chebyshev approximation (similar to Fourier series/transforms) should still converge faster than polynomially.
Note that your suggested asymptotic approximation is not good enough here, because it only approaches the asymptotic lines as O(b/x^2), not exponentially.
A simple exponential approximation to the asymptotes is:
function y_approx(x, a, b)
f = (tanh(x) + 1)/2
y1 = (x - a[begin]) / b[begin]
y2 = (x - a[end]) / b[end]
return y1 * (1 - f) + y2 * f
end
which converges exponentially fast to the exact y(x) for large |x|. However, you should be be able to do even better (get a steeper slope of exponential) by examining your x(y) function more closely and pulling out the next-order term.
With this approximation, I checked (on your function with random a and b) that x = t/(1-t^2) does indeed give a Chebyshev polynomial approximation on t = (-1,1) that converges faster than polynomially (the -\log |\text{error}| is proportional to \sqrt{\text{degree}}), and gives you 3 digits of accuracy with about degree 25. If you improve y_approx you should be able to do even better.
Thank you very much for the concrete suggestion. I found that it is not (generally) concave, so I looked for another form, and found that
g(x) = (p - n) \log( \exp(kx) + 1) + knx
works to approximate asymptotes
\min(px, nx)
with p < n, crossing at 0. This has
g(x) > 0,
g''(x) < 0,
g'''(0) = 0,
\lim_{x \to \infty} g(x) - px = 0,
\lim_{x \to -\infty} g(x) - nx = 0,
g'(0) can be parametrized to match c'(b) at some point.
These I can shift on the x and y axis to match the original function, and pick a k so that I match the derivatives at the crossing point. These provide a good approximation for my purposes with 10-12 Chebyshev polynomials.
So a costly Newton step is eliminated which was eating up 70% of my runtime. The replacement uses <1% of the reduced runtime. Took me about half a day to code with a lot of unit tests. Julia is amazing for a smooth transition from prototype to heavily optimized code.