Derivatives at infinity issue

I am trying to implement L’Hôpital’s rule, and my first naive version looks like this:

using ForwardDiff: derivative
∂(f) = x -> derivative(f,x)
function Hôpital(f,g,a)
    fₐ, gₐ = f(a), g(a)
    any(isnan.((fₐ,gₐ))) && error("fₐ = $fₐ, gₐ = $gₐ")
    r = fₐ / gₐ
    @show fₐ, gₐ, r
    isnan(r) ? Hôpital(∂(f), ∂(g), a) : r
end


Hôpital(x -> x^10 - 1, x -> x^2 - 1,1.0) # 5.0
Hôpital(exp, x -> x,Inf) # Inf
Hôpital(sin, x -> x^2, 0.0) # Inf
Hôpital(log, x -> x, 0) # -Inf, perfect. 
Hôpital(exp, x -> x^2,Inf) # Weirdly doesnt work. 

It looks like the second derivative of exp at infinity is NaN, while it should be Inf. Same for x -> x^2, while it should be 2.

is there a way to get these derivatives at infinity correct ?

Note: i am not interested in using a package like Scipy to compute limits, but rather to see how far I can go with standard Julia.

Doesn’t this require symbolic differentiation?

Well, as soon as ForwardDiff evaluates the second derivative of x -> exp(x) at x = Inf to be Inf and not NaN, this will work. But maybe it will be more efficient with symbolic diff ? I do not know.

So indeed, this version works a bit better on the examples i could try:

using Symbolics
@variables t
∂(f) = Symbolics.derivative(f,t,simplify=true)
build(f) = build_function(simplify(f), t, expression=Val{false})
function Hôpital(f,g,a)
    fₜ, gₜ = f(t), g(t)
    L = build(fₜ/gₜ)(a)
    @show fₜ/gₜ
    while isnan(L)
        fₜ, gₜ = ∂(fₜ), ∂(gₜ)
        L = build(fₜ/gₜ)(a)
        @show fₜ/gₜ
    end
    return L
end

Hôpital(x -> x^10 - 1, x -> x^2 - 1,1.0) == 5.0
Hôpital(exp, x -> x^20,Inf) == Inf
Hôpital(sin, x -> x^2, 0.0) == Inf
Hôpital(log, x -> x, 0) == -Inf 
Hôpital(x -> cos(x)-1, x -> x^2, 0) == -1/2
Hôpital(x -> sin(5*x), x -> sin(2*x), 0) == 5/2

I wonder if there will be falling cases or if it matches the theorems assumptions correctly…

Also this is pretty slow :confused:

I think the correct version of that sentence is

The derivative at infinity of the code we use to approximate the derivative of exp is NaN.

Because indeed if we were trying to just differentiate exp we would get Inf again. So the issue is that the ForwardDiff.derivative of exp is not exactly the function exp, which means ForwardDiff dual numbers presumably don’t work as well with it (due to the lack of custom rule).

This is related to a blog post by @oxinabox I discovered yesterday: Automatic Differentiation Does Incur Truncation Errors (kinda)

I don’t have a ready made solution but perhaps it can help you think about the problem

2 Likes

You are right on the formulation: the twiced-autodiffed version of julia’s exp(::Float64) is not behaving correctly for Inf. Also tried BigFloat, same issue.

I remember trying out ForwardDiff.jl but also TaylorSeries.jl and TaylorDiff.jl, all three have the same issue.

Reading your link, I think it might be fixable by telling the AD system that exp' is exp. I wander why ForwardDiff is not aware of that already.

ForwardDiff has special rules for the exponential function applied to a dual number. But it doesn’t have rules for transforming functions into functions (that’s more the realm of symbolic computing).

julia> using ForwardDiff

julia> ∂(f) = x -> derivative(f, x)
∂ (generic function with 1 method)

julia> exp
exp (generic function with 14 methods)

julia> ∂(exp)
#7 (generic function with 1 method)

How is ForwardDiff supposed to know that #7 has specific derivative behavior?

1 Like

So we are doomed to have

julia> using ForwardDiff: derivative

julia> derivative(exp,Inf)
Inf

julia> derivative(t -> derivative(exp,t),Inf)
NaN

julia> 

? If I get you correctly, this is a dispatch issue : the path for derivating exp correctly exists, but cannot be reached because t -> derivative(exp,t) is not the same function.

1 Like

Which is why the symbolic approach makes sense.

  • a symbolic differentiation system knows that the derivative of exp (as a function) is still exp
  • an algorithmic differentiation system only knows how to compute this derivative at a given point
1 Like

But what I do not understand is that there is no chain rule for t -> derivative(exp,t), right, so forwardDiff should just plug a Dual into it, and then stepping through the code will encounter the inside exp(::Dual) and should do the right thing… But clearly this is not the case.

Here’s an example that shows you how a NaN might arise from this type of code. Not sure this is exactly what goes on inside ForwardDiff but it’s probably close:

struct MyDual{T}
    x::T
    d::T
end

Base.show(io::IO, d::MyDual) = print(io, "D($(d.x), $(d.d))")

function Base.one(d::MyDual)
    res = MyDual(one(d.x), zero(d.d))
    @info "one($d) = $res"
    return res
end

function Base.:*(d1::MyDual, d2::MyDual)
    res = MyDual(d1.x * d2.x, d1.x * d2.d + d2.x * d1.d)
    @info "$d1 * $d2 = $res"
    return res
end

function Base.exp(d::MyDual)
    res = MyDual(exp(d.x), exp(d.x) * d.d)
    @info "e^$d = $res"
    return res
end

myderivative(f, x) = f(MyDual(x, one(x))).d

exp_der(x) = myderivative(exp, x)
julia> myderivative(exp, Inf)
[ Info: e^D(Inf, 1.0) = D(Inf, Inf)
Inf

julia> myderivative(exp_der, Inf)
[ Info: one(D(Inf, 1.0)) = D(1.0, 0.0)
[ Info: e^D(Inf, 1.0) = D(Inf, Inf)
[ Info: e^D(Inf, 1.0) = D(Inf, Inf)
[ Info: D(Inf, Inf) * D(1.0, 0.0) = D(Inf, NaN)
[ Info: e^D(D(Inf, 1.0), D(1.0, 0.0)) = D(D(Inf, Inf), D(Inf, NaN))
NaN
1 Like

It is encountering a rule for exp, but the problem is that it’s subtracting a term from it multiplied by zero. The actual thing being computed for exp''(t) is

exp_t = exp(t)
1.0 * exp_t + 0.0 * exp_t^2 

and when t = Inf, then 0.0 * exp_t^2 is NaN.

3 Likes

I think it’s the same multiplication Inf * 0.0 which you see in my code. I didn’t have the mathematical explanation though!

1 Like

It’s because the AD system is calculating the derivative of the constant value 1.0 to be 0.0. If you use Zygote.jl though, it has “strong” zeros for constants and so it gets Inf here:

julia> using Zygote: var"'"

julia> exp''(Inf)
Inf

and

julia> function Hôpital(f,g,a)
           fₐ, gₐ = f(a), g(a)
           any(isnan.((fₐ,gₐ))) && error("fₐ = $fₐ, gₐ = $gₐ")
           r = fₐ / gₐ
           @show fₐ, gₐ, r
           isnan(r) ? Hôpital(f', g', a) : r
       end;

julia> Hôpital(x -> x^10 - 1, x -> x^2 - 1,1.0)
(fₐ, gₐ, r) = (0.0, 0.0, NaN)
(fₐ, gₐ, r) = (10.0, 2.0, 5.0)
5.0

julia> Hôpital(exp, x -> x,Inf)
(fₐ, gₐ, r) = (Inf, Inf, NaN)
(fₐ, gₐ, r) = (Inf, 1.0, Inf)
Inf

julia> Hôpital(sin, x -> x^2, 0.0)
(fₐ, gₐ, r) = (0.0, 0.0, NaN)
(fₐ, gₐ, r) = (1.0, 0.0, Inf)
Inf

julia> Hôpital(log, x -> x, 0)
(fₐ, gₐ, r) = (-Inf, 0, -Inf)
-Inf

julia> Hôpital(exp, x -> x^2,Inf) 
(fₐ, gₐ, r) = (Inf, Inf, NaN)
(fₐ, gₐ, r) = (Inf, Inf, NaN)
(fₐ, gₐ, r) = (Inf, 2.0, Inf)
Inf

This is something Diffractor.jl should support but currently doesn’t, so I opened an issue here: Not using "strong zeros" for deriatives of constant values · Issue #278 · JuliaDiff/Diffractor.jl · GitHub

4 Likes

This gives a really short answer to my problem:

using Zygote: var"'"
function Hôpital(f,g,a)
    r = f(a) / g(a)
    isnan(r) ? Hôpital(f', g', a) : r
end

# All these are fine: 
Hôpital(x -> cos(x)-1 + sin(x^2), x -> x^3, 0) == Inf
Hôpital(x -> x^10 - 1, x -> x^2 - 1,1.0) == 5.0
Hôpital(exp, x -> x^2,Inf) == Inf
Hôpital(sin, x -> x^2, 0.0) == Inf
Hôpital(log, x -> x, 0) == -Inf 
Hôpital(sin, x-> x^2, 0) == Inf
Hôpital(x -> sin(5*x), x -> sin(2*x), 0) == 5/2

But, weirdly, it is REAAAAALY slow when the second or third derivative is hit. Might be related to what @oxinabox said on your diffractor issue there Not using "strong zeros" for deriatives of constant values · Issue #278 · JuliaDiff/Diffractor.jl · GitHub

1 Like

Not weird at all, Zygote is very much a bad system for higher order AD, so bad that Diffractor.jl was designed specifically so that it’d compose better.

Basically if you do nth order derivatives, then Zygote will produce an amount of code that goes like the exponential or factorial of n, most of which is trivial junk, and making an insane amount of work for the compiler to sift through.

2 Likes