This is, strictly speaking, not a Julia but a math question, but I am running into this with AD and optimization.

I am solving a parametric system

f(x, \theta) = g(x, \theta)

for x given \theta. I could implement this as

function residual(x, θ)
F = f(x, θ)
G = g(x, θ)
F .- G
end

but it may make more sense near the optimum to use a criterion like @. (F - G)/F or @. (F - G)/G. But this may fail if either F ≈ 0 or G ≈ 0.

Some texbooks recommend something like @. (F - G)/max(1, F, G), but then the derivatives are not continuous.

Is there a “standard” way of doing what I want in a continuously differentiable way? I think I could combine a softmax to get this, but thought I would ask first here.

will prevent derivative discontinuities when F-G oscillates around zero which is likely to happen in such a problem.
You still get discontinuities if either F or G are oscillating around zeros, but this can be solved by creating a smoothed version of abs()

If I understand it correctly, the approximation for |x| would be

\sqrt{x^2 + d^2}

for d > 0. This is similar to something I saw on stackexchange, and could work very well. It is definitely simpler and easier to reason about (for me).