Derivatives at infinity issue

lrnv · March 11, 2024, 6:44am

I am trying to implement L’Hôpital’s rule, and my first naive version looks like this:

using ForwardDiff: derivative
∂(f) = x -> derivative(f,x)
function Hôpital(f,g,a)
    fₐ, gₐ = f(a), g(a)
    any(isnan.((fₐ,gₐ))) && error("fₐ = $fₐ, gₐ = $gₐ")
    r = fₐ / gₐ
    @show fₐ, gₐ, r
    isnan(r) ? Hôpital(∂(f), ∂(g), a) : r
end


Hôpital(x -> x^10 - 1, x -> x^2 - 1,1.0) # 5.0
Hôpital(exp, x -> x,Inf) # Inf
Hôpital(sin, x -> x^2, 0.0) # Inf
Hôpital(log, x -> x, 0) # -Inf, perfect. 
Hôpital(exp, x -> x^2,Inf) # Weirdly doesnt work.

It looks like the second derivative of exp at infinity is NaN, while it should be Inf. Same for x -> x^2, while it should be 2.

is there a way to get these derivatives at infinity correct ?

Note: i am not interested in using a package like Scipy to compute limits, but rather to see how far I can go with standard Julia.

nsajko · March 11, 2024, 7:07am

Doesn’t this require symbolic differentiation?

lrnv · March 11, 2024, 7:09am

Well, as soon as ForwardDiff evaluates the second derivative of x -> exp(x) at x = Inf to be Inf and not NaN, this will work. But maybe it will be more efficient with symbolic diff ? I do not know.

lrnv · March 11, 2024, 8:08am

So indeed, this version works a bit better on the examples i could try:

using Symbolics
@variables t
∂(f) = Symbolics.derivative(f,t,simplify=true)
build(f) = build_function(simplify(f), t, expression=Val{false})
function Hôpital(f,g,a)
    fₜ, gₜ = f(t), g(t)
    L = build(fₜ/gₜ)(a)
    @show fₜ/gₜ
    while isnan(L)
        fₜ, gₜ = ∂(fₜ), ∂(gₜ)
        L = build(fₜ/gₜ)(a)
        @show fₜ/gₜ
    end
    return L
end

Hôpital(x -> x^10 - 1, x -> x^2 - 1,1.0) == 5.0
Hôpital(exp, x -> x^20,Inf) == Inf
Hôpital(sin, x -> x^2, 0.0) == Inf
Hôpital(log, x -> x, 0) == -Inf 
Hôpital(x -> cos(x)-1, x -> x^2, 0) == -1/2
Hôpital(x -> sin(5*x), x -> sin(2*x), 0) == 5/2

I wonder if there will be falling cases or if it matches the theorems assumptions correctly…

Also this is pretty slow

gdalle · March 11, 2024, 9:49am

I think the correct version of that sentence is

The derivative at infinity of the code we use to approximate the derivative of exp is NaN.

Because indeed if we were trying to just differentiate exp we would get Inf again. So the issue is that the ForwardDiff.derivative of exp is not exactly the function exp, which means ForwardDiff dual numbers presumably don’t work as well with it (due to the lack of custom rule).

This is related to a blog post by @oxinabox I discovered yesterday: Automatic Differentiation Does Incur Truncation Errors (kinda)

I don’t have a ready made solution but perhaps it can help you think about the problem

lrnv · March 11, 2024, 10:02am

You are right on the formulation: the twiced-autodiffed version of julia’s exp(::Float64) is not behaving correctly for Inf. Also tried BigFloat, same issue.

I remember trying out ForwardDiff.jl but also TaylorSeries.jl and TaylorDiff.jl, all three have the same issue.

Reading your link, I think it might be fixable by telling the AD system that exp' is exp. I wander why ForwardDiff is not aware of that already.

gdalle · March 11, 2024, 10:08am

ForwardDiff has special rules for the exponential function applied to a dual number. But it doesn’t have rules for transforming functions into functions (that’s more the realm of symbolic computing).

julia> using ForwardDiff

julia> ∂(f) = x -> derivative(f, x)
∂ (generic function with 1 method)

julia> exp
exp (generic function with 14 methods)

julia> ∂(exp)
#7 (generic function with 1 method)

How is ForwardDiff supposed to know that #7 has specific derivative behavior?

lrnv · March 11, 2024, 10:15am

So we are doomed to have

julia> using ForwardDiff: derivative

julia> derivative(exp,Inf)
Inf

julia> derivative(t -> derivative(exp,t),Inf)
NaN

julia>

? If I get you correctly, this is a dispatch issue : the path for derivating exp correctly exists, but cannot be reached because t -> derivative(exp,t) is not the same function.

gdalle · March 11, 2024, 10:17am

Which is why the symbolic approach makes sense.

a symbolic differentiation system knows that the derivative of exp (as a function) is still exp
an algorithmic differentiation system only knows how to compute this derivative at a given point

lrnv · March 11, 2024, 10:19am

But what I do not understand is that there is no chain rule for t -> derivative(exp,t), right, so forwardDiff should just plug a Dual into it, and then stepping through the code will encounter the inside exp(::Dual) and should do the right thing… But clearly this is not the case.

gdalle · March 11, 2024, 10:38am

Here’s an example that shows you how a NaN might arise from this type of code. Not sure this is exactly what goes on inside ForwardDiff but it’s probably close:

struct MyDual{T}
    x::T
    d::T
end

Base.show(io::IO, d::MyDual) = print(io, "D($(d.x), $(d.d))")

function Base.one(d::MyDual)
    res = MyDual(one(d.x), zero(d.d))
    @info "one($d) = $res"
    return res
end

function Base.:*(d1::MyDual, d2::MyDual)
    res = MyDual(d1.x * d2.x, d1.x * d2.d + d2.x * d1.d)
    @info "$d1 * $d2 = $res"
    return res
end

function Base.exp(d::MyDual)
    res = MyDual(exp(d.x), exp(d.x) * d.d)
    @info "e^$d = $res"
    return res
end

myderivative(f, x) = f(MyDual(x, one(x))).d

exp_der(x) = myderivative(exp, x)

julia> myderivative(exp, Inf)
[ Info: e^D(Inf, 1.0) = D(Inf, Inf)
Inf

julia> myderivative(exp_der, Inf)
[ Info: one(D(Inf, 1.0)) = D(1.0, 0.0)
[ Info: e^D(Inf, 1.0) = D(Inf, Inf)
[ Info: e^D(Inf, 1.0) = D(Inf, Inf)
[ Info: D(Inf, Inf) * D(1.0, 0.0) = D(Inf, NaN)
[ Info: e^D(D(Inf, 1.0), D(1.0, 0.0)) = D(D(Inf, Inf), D(Inf, NaN))
NaN

Mason · March 11, 2024, 10:40am

It is encountering a rule for exp, but the problem is that it’s subtracting a term from it multiplied by zero. The actual thing being computed for exp''(t) is

exp_t = exp(t)
1.0 * exp_t + 0.0 * exp_t^2

and when t = Inf, then 0.0 * exp_t^2 is NaN.

gdalle · March 11, 2024, 10:41am

I think it’s the same multiplication Inf * 0.0 which you see in my code. I didn’t have the mathematical explanation though!

Mason · March 11, 2024, 10:52am

It’s because the AD system is calculating the derivative of the constant value 1.0 to be 0.0. If you use Zygote.jl though, it has “strong” zeros for constants and so it gets Inf here:

julia> using Zygote: var"'"

julia> exp''(Inf)
Inf

and

julia> function Hôpital(f,g,a)
           fₐ, gₐ = f(a), g(a)
           any(isnan.((fₐ,gₐ))) && error("fₐ = $fₐ, gₐ = $gₐ")
           r = fₐ / gₐ
           @show fₐ, gₐ, r
           isnan(r) ? Hôpital(f', g', a) : r
       end;

julia> Hôpital(x -> x^10 - 1, x -> x^2 - 1,1.0)
(fₐ, gₐ, r) = (0.0, 0.0, NaN)
(fₐ, gₐ, r) = (10.0, 2.0, 5.0)
5.0

julia> Hôpital(exp, x -> x,Inf)
(fₐ, gₐ, r) = (Inf, Inf, NaN)
(fₐ, gₐ, r) = (Inf, 1.0, Inf)
Inf

julia> Hôpital(sin, x -> x^2, 0.0)
(fₐ, gₐ, r) = (0.0, 0.0, NaN)
(fₐ, gₐ, r) = (1.0, 0.0, Inf)
Inf

julia> Hôpital(log, x -> x, 0)
(fₐ, gₐ, r) = (-Inf, 0, -Inf)
-Inf

julia> Hôpital(exp, x -> x^2,Inf) 
(fₐ, gₐ, r) = (Inf, Inf, NaN)
(fₐ, gₐ, r) = (Inf, Inf, NaN)
(fₐ, gₐ, r) = (Inf, 2.0, Inf)
Inf

This is something Diffractor.jl should support but currently doesn’t, so I opened an issue here: Not using "strong zeros" for deriatives of constant values · Issue #278 · JuliaDiff/Diffractor.jl · GitHub

lrnv · March 11, 2024, 6:30pm

This gives a really short answer to my problem:

using Zygote: var"'"
function Hôpital(f,g,a)
    r = f(a) / g(a)
    isnan(r) ? Hôpital(f', g', a) : r
end

# All these are fine: 
Hôpital(x -> cos(x)-1 + sin(x^2), x -> x^3, 0) == Inf
Hôpital(x -> x^10 - 1, x -> x^2 - 1,1.0) == 5.0
Hôpital(exp, x -> x^2,Inf) == Inf
Hôpital(sin, x -> x^2, 0.0) == Inf
Hôpital(log, x -> x, 0) == -Inf 
Hôpital(sin, x-> x^2, 0) == Inf
Hôpital(x -> sin(5*x), x -> sin(2*x), 0) == 5/2

But, weirdly, it is REAAAAALY slow when the second or third derivative is hit. Might be related to what @oxinabox said on your diffractor issue there Not using "strong zeros" for deriatives of constant values · Issue #278 · JuliaDiff/Diffractor.jl · GitHub

Mason · March 11, 2024, 10:07pm

Not weird at all, Zygote is very much a bad system for higher order AD, so bad that Diffractor.jl was designed specifically so that it’d compose better.

Basically if you do nth order derivatives, then Zygote will produce an amount of code that goes like the exponential or factorial of n, most of which is trivial junk, and making an insane amount of work for the compiler to sift through.

Topic		Replies	Views
Automatic differentiation of function with special points General Usage forwarddiff , autodiff	4	608	February 1, 2023
Higher derivative gives NAN New to Julia autodiff	4	165	July 8, 2024
Add ForwardDiff rule to allow complex arguments` General Usage zygote , forwarddiff , autodiff , chainrules	9	430	January 15, 2024
How to avoid ForwardDiff.jl generating a second-order derivative that wastes flops by eventually multiplying by zero Performance llvm , differentiation , forwarddiff	10	1101	January 25, 2021
How to tell ForwardDiff what i want it to do? General Usage forwarddiff	8	541	January 19, 2023

Derivatives at infinity issue

Related topics