Smooth approximation to max(0, x)

Tamas_Papp · January 29, 2024, 10:21am

I need a “smooth” approximation f to x \to \max(0, x), such that

\lim_{x \to -\infty} f(x) = 0,
\lim_{x \to \infty} f(x) = x,
f(x) > 0, f'(x) > 0, both continuous; ideally continuous higher derivatives
not too expensive to calculate.

I am aware of \log(a + \exp(x)), which can be implemented with LogExpFunctions.log1pexp, is there anything else?

lrnv · January 29, 2024, 10:29am

Would you accept a function that approximates only around 0 and is exactly equal to 0 before a constant -a and to x after a constant b ? If so, we could provide a very efficient solution (in term of computational time) with e.g. a bezier curve

Tamas_Papp · January 29, 2024, 10:35am

You mean a cubic polynomial matching the derivatives and the values at -a and b? I could try that, but I am looking for something smoother.

jar1 · January 29, 2024, 10:36am

Have you seen “mish” and “squareplus” from
Rectifier (neural networks) - Wikipedia, or the Smooth Maximum Unit?

This paper considers a number of smoothed functions When are smooth-ReLUs ReLU-like? | OpenReview

cdawg · January 29, 2024, 10:45am

by pattern matching Id guess maybe anything that is invf(f(0) + f(x)) where f is increasing monotonically and has range (0,inf) ?? First guess is usually wrong tho

SortofDamocles · January 29, 2024, 11:33am

That’s a good pattern for sure. Similar to the “smooth absolute value” sqrt(x^2 + c)

Tamas_Papp · January 29, 2024, 11:57am

I may be misunderstanding something, but that’s precisely the kind of function I am looking for. sure, if I have it, I can transform it a lot of ways.

No, but thanks, this is very helpful.

barucden · January 29, 2024, 12:41pm

Does \log(a + \exp(x)) have an unwanted property, or do you want to know other functions just in general?

Dan · January 29, 2024, 12:42pm

How about:

t = (1/4)^(1/3)
t1, t2 = t^4 - t, t^4

julia> relu(x) = ifelse(x > 0, x, 0)
relu (generic function with 1 method)

julia> lineplot(-0.5,0.5,x-> t1 < x < t2 ? (x-t1)^4 : relu(x))
            ┌────────────────────────────────────────┐       
        0.5 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡞│ #57(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠎⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡰⠃⠀⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡖⠁⠀⠀⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠎⠀⠀⠀⠀⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠃⠀⠀⠀⠀⠀⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡜⠁⠀⠀⠀⠀⠀⠀⠀│       
   f(x)     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⢠⠏⠀⠀⠀⠀⠀⠀⠀⠀⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⡤⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⢀⡔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⡠⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⢀⡔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⢀⡔⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│       
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⡗⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│       
          0 │⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⡤⠤⠔⠒⠋⠁⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│       
            └────────────────────────────────────────┘       
            ⠀-0.5⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀0.5⠀       
            ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

It has a conditional, but it’s kind of simple to think about. And 4th power easy to calc using double squaring. Conditional doesn’t need to mean branching code i.e.

ifelse( t1 < x < t2,  (x-t1)^4, ifelse(x < 0, 0, x))

Tamas_Papp · January 29, 2024, 12:47pm

yes.

tim.holy · January 29, 2024, 1:00pm

I like squareplus but the numerical evaluation at x < 0 involves a delicate cancelation. Two options are to switch to

\frac{b}{2\left(\sqrt{x^2+b}-x\right)}

when x < 0 (you get that just by multiplying numerator and denominator by x - \sqrt{x^2 + b}). But generally I just use

x_+ = \begin{cases} \sqrt{x^2 + \sigma^2} - \sigma & x \ge 0; \\ 0 & \textrm{otherwise}. \end{cases}

There’s also a delicate cancelation here, but it’s not one that depends strongly on x, which to me is an advantage. Pick a \sigma for which \sqrt{\sigma^2} - \sigma = 0 in floating-point arithmetic, e.g., any power of 2. Also, use hypot(x, σ) if you care about precision.

PeterSimon · January 30, 2024, 2:10am

This isn’t too practical because each function evaluation requires a numeric integration, but it was a fun way to revisit some math I hadn’t looked at in decades. I smoothed your original function by convolving it with a scaled bump function. This is motivated by 2 facts:

Convolving a distribution (your function) with a test (bump) function results in an infinitely differentiable function.
The scaled bump functions approach the delta function in the limit as \epsilon \rightarrow 0 so that you can get an arbitrarily good approximation to your original distribution.

The code and resulting plots for \epsilon = 0.01 are shown below.

using QuadGK: quadgk
using SpecialFunctions: besselk
using Plots

f(x) = max(zero(x), x) # function to be smoothed

half = 0.5 # Can use higher precision type here if desired
const I1 = exp(-half) * (besselk(1, half) - besselk(0, half)) # Integral of unnormalized bump function
bump_unnormalized(x) = abs(x) > 1 ? zero(x) : exp(-1 / (1 - x^2))
bump(x, ϵ=one(x)) = bump_unnormalized(x/ϵ) / (ϵ * I1) # normalized, scaled bump function

# convolve scaled bump function with f
f_smooth(x,ϵ) = quadgk(t -> f(x-t) * bump(t, ϵ), -ϵ, ϵ)[1]

ϵ = 0.01

p = plot(xlabel="x", ylabel="f_smooth(x)", title="ϵ = $ϵ", legend=false, ratio=1)
plot!(p, range(-10ϵ, 10ϵ, 500), x -> f_smooth(x, ϵ), legend=false, ratio=1)
display(p)

p = plot(xlabel="x", ylabel="f_smooth(x)", title="ϵ = $ϵ", legend=false, ratio=1)
plot!(p, range(-ϵ, ϵ, 500), x -> f_smooth(x, ϵ))
display(p)

p = plot(xlabel="x", ylabel="|f(x) - f_smooth(x)|", title="ϵ = $ϵ", legend=false)
plot!(p, range(-10ϵ, 10ϵ, 500), x -> abs(f(x) - f_smooth(x, ϵ)))

f_smooth_error

On my machine execution time for invoking f_smooth(x, ϵ) ranges from about 200 nsec for x < ϵ to about 3.5 usec for x > ϵ.

Edit:
Maybe this approach can be cheap enough (on average), since you only need evaluate the smoothed function within the transition region:

f_smoothfaster(x, ϵ) = abs(x) < ϵ ? fsmooth(x, ϵ) : f(x)

Oscar_Smith · January 30, 2024, 5:13am

While cool, you can also get arbitrarily accurate and analytic without needing to integrate with f(x,ϵ) = log1p(exp(x/ϵ))*ϵ

nsajko · March 31, 2024, 1:43am

Another alternative; we have {\sqrt{x^2 + b} + x} = {{\sqrt{b}} \cdot {e^{asinh{\frac{x}{\sqrt{b}}}}}}, so there are now three alternatives for squareplus:

squareplus_wikipedia(x, b) = (x + sqrt(x*x + b))/2

squareplus_timholy(x, b) = b/(2 * (sqrt(x*x + b) - x))

function squareplus_exp_asinh(x, b)
  sb = sqrt(b)
  exp(asinh(x/sb))*sb/2
end

Topic		Replies	Views
How to smooth the if function New to Julia question	11	690	May 15, 2024
Approximation matching asymptotes Numerics question	3	349	December 6, 2021
Approximate implicit function on real line Numerics question , approximation	7	310	February 3, 2025
Optimization of Curves Optimization (Mathematical) question	8	606	November 17, 2020
Best way to take derivatives of unevenly spaced data (with interpolations? discrete derivatives?) Numerics intervals , interpolations	38	5098	August 17, 2022

Smooth approximation to max(0, x)

Related topics