How to smooth the if function

Is there currently a library that provides a smoothing function to ensure the differentiability of the judgment function?

For example, I have a simple formula here:
f(x) = x > 1 ? x : x^2

Is there a smooth function here so that the following formula can obtain approximate results?

f_smooth(x) = smooth_func(x-1)*x+smooth_func(1-x)*x^2

Currently I have built a smooth_func, see below:

smooth_func(x) = (tanh(5.0 * x) + 1.0) * 0.5

There are lots of ways to smooth the ifelse function, so the first question is why do you need it? And why is the one you found not good enough?


This may help:


Thank you for your response.

Firstly, this “smooth function” is something I initially encountered in a paper. It seems to be used to replace the “ifelse” function to ensure the differentiability of the expression. I attempted to calculate the derivatives using both the “smooth” and “ifelse” approaches separately with “ForwardDiff”:

using ForwardDiff

f1(x) = x > 1 ? x^3 : x^2
smooth_func(x) = (tanh(5.0 * x) + 1.0) * 0.5
f2(x) = smooth_func(x-1) * x + smooth_func(1-x) * x^2

ForwardDiff.derivative(f1, 0.1)
ForwardDiff.derivative(f2, 0.1)

I found that functions based on “ifelse” can also be differentiated using “ForwardDiff”. Being a beginner in this area, I started to question whether the “smooth function” is necessary when dealing with situations involving “ifelse”.

Moving on to the second question, the “smooth function” doesn’t produce the expected results when the input approaches zero. For example:

julia> smooth_func(1e-6)

julia> smooth_func(-1e-6)

julia> smooth_func(1e-2)

julia> smooth_func(1e-3)

julia> smooth_func(1e-1)

This greatly affects my subsequent calculations and the results of differentiation. After multiple attempts, I found that adding the constant 5.0 as the first term can effectively avoid this issue. It appears that there is a significant correlation between the input value and the “smooth function”. When abs(5.0*x) >> 1, the result matches expectations.

1 Like

Thank you, this seems to be what I’m looking for. I’ll go take a look.

That’s why I’m asking about your use case. In most cases, you don’t care about the behavior near the non-differentiable point (in this case a discontinuity). ForwardDiff makes an arbitrary choice, other autodiff backends may make other choices, but without knowing what you do it’s hard to decide if these choices matter at all.

What is your “expected result” near zero? It seems to approach 0.5, which is a reasonable behavior.

Can you provide a complete example?


I have written a demo (but it does not fully reproduce the issue I encountered before), see below:

using OrdinaryDiffEq, ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D

@parameters σ ρ β
@variables x(t)

step_func(v) = (tanh(β * v) + 1.0) * 0.5

eqs = [
    D(x) ~ step_func(t - ρ) * σ

@mtkbuild sys = ODESystem(eqs, t)

u0 = [x => 1.0]

p = [
    σ => 1.0,
    ρ => 5.0,
    β => 1.0]

tspan = (4.5, 5.5)
prob = ODEProblem(sys, u0, tspan, p, jac=true)

for i in [0.1, 1.0, 5.0, 10.0, 50.0]
    new_prob = remake(prob, p=[β => i])
    new_sol = solve(new_prob, Tsit5(), saveat=0.1)

I keep adjusting the parameters of the smooth function, and the calculation results also show deviations.
Using the smooth function in my project may lead to even worse situations, such as the issue of ordinary differential equations having no solutions, and the calculation results include unexpected negative numbers.

What is your “expected result” near zero? It seems to approach 0.5, which is a reasonable behavior.

I hope to nearly perfectly simulate the function v > 0 ? 1 : 0, but when it approaches 0, the result of the smooth function has a slight discrepancy, for example,

(tanh(5.0 * 0.1) + 1.0) * 0.5 = 0.731058
(tanh(5.0 * -0.1) + 1.0) * 0.5 = 0.268941

Just increase the “gain”

(tanh(50.0 * 0.1) + 1.0) * 0.5 = 0.99995460213

(tanh(50.0 * -0.1) + 1.0) * 0.5=0.00004539786

But It have side efrects since the derivative at 0 will increase and It could be a problem, it deppends on your exact application…

1 Like

Yes, improving gain can indeed prevent this issue.

The standard solution to this is to “compactify” the transition region of a sigmoid. The classic function you see in differential geometry is

\verb|smooth_func|(x) = \begin{cases} 0 & x \leq -1, \\ \frac{1+\tanh[x/(1-x^2)]}{2} & -1 < x < 1, \\ 1 & x \geq 1. \end{cases}

This turns out to be a smooth (infinitely differentiable, C^\infty) function. Here’s an implementation I’ve used that I think does a pretty good job of dealing with the various roundoff/overflow issues that can arise with this function. I’ve written it for vector input, but you could convert to accept a single point if you want:

function smooth_func(x::AbstractArray, x₀, x₁, y₀=0, y₁=1)
    @assert x₀ ≤ x₁
    y = similar(x)
    x₀, x₁, y₀, y₁, m² = convert.(eltype(y), (x₀, x₁, y₀, y₁, 3))
    m = √m²
    Δy = (y₁ - y₀) / 2
    Δx = (x₁ - x₀) / 2
    ȳ = (y₀ + y₁) / 2
    x̄ = (x₀ + x₁) / 2
    for i ∈ eachindex(x, y)
        x̂ = (x[i] - x̄) / Δx
        # Note: These || cases are nearly redundant,
        # but protect against very rare roundoff issues
        if x̂ ≤ -1 || x[i] ≤ x₀
            y[i] = y₀
        elseif x̂ ≥ 1 || x[i] ≥ x₁
            y[i] = y₁
            y[i] = ȳ + Δy * tanh(m * x̂ / (1 - x̂^2))
    return y

You have to tell it the x_0 value where you want it to start transitioning and the x_1 value where you want it to stop transitioning. Optionally, you can also adjust the y values before and after the transition. Then, you can use it in the standard way:

x0 = -1
x1 = 1
τ = smooth_func(x, x0, x1)
f_smooth = (1-τ) * x + τ * x^2

For example, it can cause optimization algorithms to converge very slowly (e.g. because the second derivative matrix may be badly conditioned in a multi-variate problem). Or an ODE solver may converge very slowly (require tiny time steps).

Really, people are just stabbing in the dark here without knowing the application. There is no generically “correct” or “good” way to make a non-differentiable function differentiable, because the question itself is underspecified.


In an ODE context, it’s much much better to just put in the correct discontinuity analytically, e.g. via callbacks. There is no conceptual problem with having discontinuities or (distributional) derivatives thereof on the rhs of an ODE, and implementing this isn’t too hard either.