Ok, now a summary of what I have learned.

it seems that `ForwardDiff`

is too smart. If you change the closure significantly, it does a lot of work that is able to speed up a lot many computations of the same/similar function. But, if your goal is to compute hessians of functions, defined via closures that change significantly from call to call the overhead is too much.

In the end, I have ended up with a mixed solution:

1.- I use `ForwardDiff`

to define the derivatives/second derivatives of the elementary functions. (Note that these are compiled once, and `ForwardDiff`

will be very efficient doing these computations).

2.- Then I define my own Poor’s man Automatic differentiation. The main differences with `ForwardDiff`

, is that it basically only works for Float64 types, and that I define custom operations between dual numbers and Float64 numbers **without** promoting the Float64 numbers to dual numbers. For example:

```
Base.:+(h1::hyperd, h2::hyperd) = hyperd(h1.v+h2.v, h1.d1+h2.d1, h1.d2+h2.d2, h1.dd+h2.dd)
Base.:+(h1::hyperd, h2::Number) = hyperd(h1.v+h2, h1.d1, h1.d2, h1.dd)
Base.:+(h1::Number, h2::hyperd) = hyperd(h1+h2.v, h2.d1, h2.d2, h2.dd)
```

In my very poor understanding of all this, promoting to Dual numbers is at the core of this large allocation/first execution time.

This is part of a module that I am writing about something completely unrelated to AD, but the undocumented relevant code is here:

My poor’s man implementation of hessian:

My `hyperd`

type is similar to ForwardDiff but with a fixed chunk size of 1. This avoids allocations at the cost of more evaluations of the function:

```
mutable struct hyperd
v::Float64
d1::Float64
d2::Float64
dd::Float64
end
```

Now some results, for the basic code:

```
using ForwardDiff
import ADerrors
function do_hess(cs::Function, x::Vector{Float64}, y::Vector{Float64})
n = length(x)
m = length(y)
xav = zeros(Float64, n+m)
for i in 1:n
xav[i] = x[i]
end
for i in n+1:n+m
xav[i] = y[i-n]
end
#trying views,this helped a lot!
function fx(x0)
x1 = @view x0[1:n]
x2 = @view x0[n+1:end]
return cs(x1,x2)
end
hess = zeros(n+m, n+m)
print("ForwardDiff timming: ")
@time ForwardDiff.hessian!(hess, fx, xav)
print("ForwardDiff timming: ")
@time ForwardDiff.hessian!(hess, fx, xav)
println("Custom hessian")
print("hyperd_hessian timming: ")
@time ADerrors.hyperd_hessian!(hess, fx, xav)
print("hyperd_hessian timming: ")
@time ADerrors.hyperd_hessian!(hess, fx, xav)
println("Exit")
return hess
end
function defs(f, x, w, n, m)
chisq(p, d) = sum((d .- f(x, p)).^2 .*w)
return chisq
end
x = rand(10)
w = rand(10).^2
@. lmfcn(x, p) = p[1] + p[2]*x
function lm2(x, p)
res = Vector{eltype(p)}(undef, length(x))
for i in eachindex(res)
res[i] = p[1] + p[2]*x[i] + 1.0/x[i]
end
return res
end
chisq = defs(lm2, x, w, 2, 10)
do_hess(chisq, [1.0, 1.0], [1.0*n for n in 1:10])
println("Change closure")
x = rand(10)
w = rand(10).^2
chisq = defs(lmfcn, x, w, 2, 10)
do_hess(chisq, [1.0, 1.0], [1.0*n for n in 1:10])
println("Change closure")
x = rand(10)
w = rand(10).^2
@. lmfcn(x, p) = p[1]/x + p[2]*x^2
chisq = defs(lmfcn, x, w, 2, 10)
do_hess(chisq, [1.0, 1.0], [1.0*n for n in 1:10])
```

**RESULTS:**

```
ForwardDiff timming: 5.681562 seconds (5.28 M allocations: 235.013 MiB, 2.66% gc time)
ForwardDiff timming: 0.000039 seconds (11 allocations: 61.453 KiB)
Custom hessian
hyperd_hessian timming: 0.000080 seconds (4.15 k allocations: 211.578 KiB)
hyperd_hessian timming: 0.000076 seconds (4.15 k allocations: 211.578 KiB)
Exit
Change closure
ForwardDiff timming: 4.629693 seconds (4.51 M allocations: 187.885 MiB, 0.94% gc time)
ForwardDiff timming: 0.000031 seconds (11 allocations: 61.453 KiB)
Custom hessian
hyperd_hessian timming: 0.162594 seconds (111.11 k allocations: 5.882 MiB, 13.89% gc time)
hyperd_hessian timming: 0.000361 seconds (7.27 k allocations: 306.641 KiB)
Exit
Change closure
ForwardDiff timming: 1.905636 seconds (1.76 M allocations: 72.483 MiB, 6.92% gc time)
ForwardDiff timming: 0.000031 seconds (11 allocations: 61.453 KiB)
Custom hessian
hyperd_hessian timming: 0.036707 seconds (57.31 k allocations: 2.639 MiB)
hyperd_hessian timming: 0.000437 seconds (9.37 k allocations: 366.359 KiB)
```

So yes, my implementation takes about 2-10 times longer than `ForwardDiff`

once it has compiled the basic functions. This seems to be true no matter if you change the closure/function. The only important thing for my implementation to perform is that the basic operations (sin, cos, product of `hyperd`

, etc…) are precompiled.

On the other hand, once ForwardDiff is tuned, it is much faster than my implementation. But the overhead of this tunning is about 1M hessian evaluations!

**Final Question**

To the experts/developers of `ForwardDiff`

. Would not be more efficient to hard-code the operations between dual numbers and normal numbers, without promotion? (… notice that maybe I am not understanding `ForwardDiff`

code very well…).

**One last thing:** Many thanks for the help!