# ForwardDiff question

I’m having trouble debugging why ForwardDiff is not working for my function. Below is a mwe. Its very close to working, but I can’t figure it out.

``````using ForwardDiff
using LinearAlgebra

struct Stack
M::Matrix{Int}
Us::Vector{Matrix{Complex{Float64}}}
end

stack = Stack(rand(0:1,4,10), [rand(4,2) for i in 1:10])

function update!(stack::Stack, x::Vector, p::Int)
# A = exp.(Diagonal(stack.M[:,p] )) # works
A = exp.(Diagonal(stack.M[:,p] * x[p])) # doesn't work
B = qr!(A[:, 1:2], Val(true)).Q |> Matrix
stack.Us[p] .= B
end

update!(stack, rand(10), 1)

function f(stack, x, p)
update!(stack, x, p)
return diag(sum(stack.Us)) |> sum |> real
end

g = x -> ForwardDiff.gradient(x -> f(stack, x, 1), x)

f(stack, rand(10),1)
g( rand(10))
``````
``````⇡5% [I] ➜ jul --project mwe_forwarddiff.jl
ERROR: LoadError: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##6#8")),Float64},Float64,10})
Closest candidates are:
Float64(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:200
Float64(::T<:Number) where T<:Number at boot.jl:718
Float64(!Matched::Int8) at float.jl:60
...
Stacktrace:
[1] convert(::Type{Float64}, ::ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##6#8")),Float64},Float64,10}) at ./number.jl:7
[2] Complex{Float64}(::ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##6#8")),Float64},Float64,10}, ::Int64) at ./complex.jl:12
[3] Complex{Float64}(::ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##6#8")),Float64},Float64,10}) at ./complex.jl:35
[4] convert(::Type{Complex{Float64}}, ::ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##6#8")),Float64},Float64,10}) at ./number.jl:7
[5] setindex!(::Array{Complex{Float64},2}, ::ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##6#8")),Float64},Float64,10}, ::Int64) at ./array.jl:780
[6] copyto! at ./abstractarray.jl:807 [inlined]
[7] copyto! at ./abstractarray.jl:799 [inlined]
[11] update!(::Stack, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##6#8")),Float64},Float64,10},1}, ::Int64) at /home/meach/.julia/dev/PDQMC/src/mwe_forwarddiff.jl:15
[12] f(::Stack, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##6#8")),Float64},Float64,10},1}, ::Int64) at /home/meach/.julia/dev/PDQMC/src/mwe_forwarddiff.jl:21
[13] (::getfield(Main, Symbol("##6#8")))(::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##6#8")),Float64},Float64,10},1}) at /home/meach/.julia/dev/PDQMC/src/mwe_forwarddiff.jl:28
[17] (::getfield(Main, Symbol("##5#7")))(::Array{Float64,1}) at /home/meach/.julia/dev/PDQMC/src/mwe_forwarddiff.jl:28
[18] top-level scope at /home/meach/.julia/dev/PDQMC/src/mwe_forwarddiff.jl:32
[19] include at ./boot.jl:328 [inlined]
[21] include(::Module, ::String) at ./Base.jl:31
[22] exec_options(::Base.JLOptions) at ./client.jl:295
[23] _start() at ./client.jl:468
in expression starting at /home/meach/.julia/dev/PDQMC/src/mwe_forwarddiff.jl:32

``````

Any ideas?

ForwardDiff uses an extended number type (`Dual`), which it cannot assign to your preallocated buffers in `stack`.

It is best to use a functional style when working with AD.

2 Likes

This works but will not be fast:

``````using ForwardDiff
using LinearAlgebra

struct Stack
M::Matrix{Int}
Us#::Vector{Matrix{Complex{Float64}}}
end

function update!(stack::Stack, x::Vector, p::Int)
# A = exp.(Diagonal(stack.M[:,p] )) # works
A = exp.(Diagonal(stack.M[:,p] * x[p])) # doesn't work
B = qr!(A[:, 1:2], Val(true)).Q |> Matrix
stack.Us[p] = B
return nothing
end

function f(stack, x, p)
update!(stack, x, p)
return diag(sum(stack.Us)) |> sum |> real
end

stack = Stack(rand(0:1,4,10), Any[rand(4,2) for i in 1:10])
update!(stack, rand(10), 1)

g = x -> ForwardDiff.gradient(x -> f(stack, x, 1), x)
f(stack, rand(10),1)
g( rand(10))
``````

Thus, as Tamas said, often a functional style is better. If you must use buffers, you will need to to more acrobatics to get it fast.

1 Like

I just refactored a 10kLOC codebase to work with AD. It had a lot of preallocation, which turned out to be a rather useless and premature optimization that gained me about 3% in v1.2 and even less in v1.3-pre.

I now think that for small arrays, StaticArrays is the answer, and for large ones, the allocation is not worth optimizing, because it is so tedious and the same effort spent on the algorithm usually has a much higher yield.

Also, I am wondering if seeing all that preallocated array code is helpful to beginners. It can squeeze out that extra 5–10%, which sometimes matters, but it is not in the top 5 things I would optimize about my Julia code these days, and in turn it causes a lot of issues like this one. My 2 cents.

4 Likes

What if I treat “stack” as a “vector”, that I want the gradient of, then just forget about it ?
i.e. make it a subtype of AbstractArray so that I can AD through it,
or more simply,

``````Y = [stack, rand(10)]
g = Y -> ForwardDiff.gradient(Y -> f(Y, 1), Y)
f(Y, 1)
g(Y)
``````

You can also do stuff like Nested ForwardDiff.jacobian calls with inplace function.

That’s not how it works (as am I sure you saw when you tried), since `stack` will not support the types and operations required by ForwardDiff.