Smooth approximation to max(0, x)

I need a “smooth” approximation f to x \to \max(0, x), such that

  1. \lim_{x \to -\infty} f(x) = 0,
  2. \lim_{x \to \infty} f(x) = x,
  3. f(x) > 0, f'(x) > 0, both continuous; ideally continuous higher derivatives
  4. not too expensive to calculate.

I am aware of \log(a + \exp(x)), which can be implemented with LogExpFunctions.log1pexp, is there anything else?

Would you accept a function that approximates only around 0 and is exactly equal to 0 before a constant -a and to x after a constant b ? If so, we could provide a very efficient solution (in term of computational time) with e.g. a bezier curve

You mean a cubic polynomial matching the derivatives and the values at -a and b? I could try that, but I am looking for something smoother.

1 Like

Have you seen “mish” and “squareplus” from
Rectifier (neural networks) - Wikipedia, or the Smooth Maximum Unit?

This paper considers a number of smoothed functions When are smooth-ReLUs ReLU-like? | OpenReview


by pattern matching Id guess maybe anything that is invf(f(0) + f(x)) where f is increasing monotonically and has range (0,inf) ?? First guess is usually wrong tho

That’s a good pattern for sure. Similar to the “smooth absolute value” sqrt(x^2 + c)

I may be misunderstanding something, but that’s precisely the kind of function I am looking for. sure, if I have it, I can transform it a lot of ways.

No, but thanks, this is very helpful.

1 Like

Does \log(a + \exp(x)) have an unwanted property, or do you want to know other functions just in general?

How about:

t = (1/4)^(1/3)
t1, t2 = t^4 - t, t^4
julia> relu(x) = ifelse(x > 0, x, 0)
relu (generic function with 1 method)

julia> lineplot(-0.5,0.5,x-> t1 < x < t2 ? (x-t1)^4 : relu(x))
        0.5 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡞│ #57(x)
   f(x)     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⢠⠏⠀⠀⠀⠀⠀⠀⠀⠀⠀│       
          0 │⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⡤⠤⠔⠒⠋⠁⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│       

It has a conditional, but it’s kind of simple to think about. And 4th power easy to calc using double squaring. Conditional doesn’t need to mean branching code i.e.

ifelse( t1 < x < t2,  (x-t1)^4, ifelse(x < 0, 0, x))


I like squareplus but the numerical evaluation at x < 0 involves a delicate cancelation. Two options are to switch to


when x < 0 (you get that just by multiplying numerator and denominator by x - \sqrt{x^2 + b}). But generally I just use

x_+ = \begin{cases} \sqrt{x^2 + \sigma^2} - \sigma & x \ge 0; \\ 0 & \textrm{otherwise}. \end{cases}

There’s also a delicate cancelation here, but it’s not one that depends strongly on x, which to me is an advantage. Pick a \sigma for which \sqrt{\sigma^2} - \sigma = 0 in floating-point arithmetic, e.g., any power of 2. Also, use hypot(x, σ) if you care about precision.


This isn’t too practical because each function evaluation requires a numeric integration, but it was a fun way to revisit some math I hadn’t looked at in decades. I smoothed your original function by convolving it with a scaled bump function. This is motivated by 2 facts:

  1. Convolving a distribution (your function) with a test (bump) function results in an infinitely differentiable function.
  2. The scaled bump functions approach the delta function in the limit as \epsilon \rightarrow 0 so that you can get an arbitrarily good approximation to your original distribution.

The code and resulting plots for \epsilon = 0.01 are shown below.

using QuadGK: quadgk
using SpecialFunctions: besselk
using Plots

f(x) = max(zero(x), x) # function to be smoothed

half = 0.5 # Can use higher precision type here if desired
const I1 = exp(-half) * (besselk(1, half) - besselk(0, half)) # Integral of unnormalized bump function
bump_unnormalized(x) = abs(x) > 1 ? zero(x) : exp(-1 / (1 - x^2))
bump(x, ϵ=one(x)) = bump_unnormalized(x/ϵ) / (ϵ * I1) # normalized, scaled bump function

# convolve scaled bump function with f
f_smooth(x,ϵ) = quadgk(t -> f(x-t) * bump(t, ϵ), -ϵ, ϵ)[1]

ϵ = 0.01

p = plot(xlabel="x", ylabel="f_smooth(x)", title="ϵ = $ϵ", legend=false, ratio=1)
plot!(p, range(-10ϵ, 10ϵ, 500), x -> f_smooth(x, ϵ), legend=false, ratio=1)

p = plot(xlabel="x", ylabel="f_smooth(x)", title="ϵ = $ϵ", legend=false, ratio=1)
plot!(p, range(-ϵ, ϵ, 500), x -> f_smooth(x, ϵ))

p = plot(xlabel="x", ylabel="|f(x) - f_smooth(x)|", title="ϵ = $ϵ", legend=false)
plot!(p, range(-10ϵ, 10ϵ, 500), x -> abs(f(x) - f_smooth(x, ϵ)))


On my machine execution time for invoking f_smooth(x, ϵ) ranges from about 200 nsec for x < ϵ to about 3.5 usec for x > ϵ.

Maybe this approach can be cheap enough (on average), since you only need evaluate the smoothed function within the transition region:

f_smoothfaster(x, ϵ) = abs(x) < ϵ ? fsmooth(x, ϵ) : f(x)
1 Like

While cool, you can also get arbitrarily accurate and analytic without needing to integrate with f(x,ϵ) = log1p(exp(x/ϵ))*ϵ