Need help using ForwardDiff gradient with numerical loop

I’m testing ways to get the gradient of a function which at its core is a numerical loop. I’m trying to use ForwardDiff currently to see if it’s any faster than Zygote reverse diff. I can’t get ForwardDiff to perform AD on this MWE; could someone help me figure out what’s wrong with my implementation?

function ForDiffSL(x)
        U = U1*x
        L = 1
        k = 0.25
        dr = r[2] - r[1]

        len = size(r)[1]-1
        ur = zeros(len)
        ui = zeros(len)
        a = r[end-2]
        ur[2] = 1e-2
        ui[1] = 1e-3
        ui[2] = 1e-2
        for i in 3:len
            uval = (L*(L+1)*ur[i-1]+ui[i-1]*k)*U[i]
            ur[i] = uval/ur[i-1]
            ui[i] = uval/ui[i-1]
        end
        SL = ur[len] * ui[len]
        return SL
end

global r = Vector(LinRange(0, 20, 20000))

global U1 = rand(size(r,1), 2)

x = rand(2,2)

dSL_fd = ForwardDiff.gradient(ForDiffSL, x)

I get the following error message when running this code:

ERROR: LoadError: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4})

Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat
   @ Base rounding.jl:207
  (::Type{T})(::T) where T<:Number
   @ Core boot.jl:792
  (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number}
   @ Base char.jl:50
  ...

Stacktrace:
  [1] convert(#unused#::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4})
    @ Base ./number.jl:7
  [2] setindex!(A::Vector{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4}, i1::Int64)
    @ Base ./array.jl:969
  [3] ForDiffSL(x::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4}})
    @ Main ~/nuclear-diffprog/MWEs/mwe_coreloop.jl:49
  [4] vector_mode_dual_eval!
    @ ~/.julia/packages/ForwardDiff/onEFt/src/apiutils.jl:24 [inlined]
  [5] vector_mode_gradient(f::typeof(ForDiffSL), x::Matrix{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4}}})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/onEFt/src/gradient.jl:91
  [6] gradient
    @ ~/.julia/packages/ForwardDiff/onEFt/src/gradient.jl:20 [inlined]
  [7] gradient(f::typeof(ForDiffSL), x::Matrix{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4}}})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/onEFt/src/gradient.jl:17
  [8] gradient(f::typeof(ForDiffSL), x::Matrix{Float64})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/onEFt/src/gradient.jl:17
  [9] macro expansion
    @ ~/nuclear-diffprog/MWEs/mwe_coreloop.jl:81 [inlined]
 [10] top-level scope
    @ ./timing.jl:273
 [11] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [12] top-level scope
    @ REPL[1]:1
in expression starting at mwe_coreloop.jl:80

Welcome, unfortunately, you ran into a common problem when using auto-diff. In order to run forward diff on your function, it’s types need to be general enough as forward diff is implemented via a special dual number type.
Now, using zeros(len) will allocate a vector typed Float64. Instead, use zeros(eltype(x), len) to match the type of the input used during forward diff.

Further, I would suggest to pass the globals r and U1 as explicit arguments to ForDiffSL. When these should not be included in the gradient calculation, you can pass an anonymous function, i.e., ForwardDiff.gradient(x -> ForDiffSL(x, r, U1), x).

3 Likes

Welcome to the forum!
To make life easier for people trying to help you, it’s very useful to format the code with backticks, like so:

```julia
f(x) = 1
```

It seems that you tried but perhaps you used the wrong character? See this post for other tips on how to use the forum:

1 Like

(I edited the post to fix it from the wrong character ''' to the correct character ```. Note that the julia suffix is not necessary in this forum, where it is the default.)

2 Likes

This will probably speed up your function and its gradient by quite a lot. If you want even more speed, you can probably avoid allocating ui and ur inside the function, because all you need are the last terms anyway so it’s enough to store a number for each and update it iteratively.

2 Likes

I really appreciate the help. I got ForwardDiff working with your example, and it’s much faster than Zygote for my problem!

However, I tried two of the suggestions here for speeding up code: passing an anonymous function to FD.gradient (avoiding global variables) and replacing the long vectors ur and ui with short ones that I overwrite. Both of these suggestions actually slowed my code down, especially passing the anonymous function (20x slower). Could you help me understand why that might be? I notice I only get speedup on the global version relative to the anonymous on the second and subsequent compilation/execution of the code; does this explain the difference?

In my real code I’m passing an anonymous version of the loss function to Zygote.gradient() and I suspect this is slowing me down a lot.

function ForDiffSL(U1, x, r)
        U = U1*x
        L = 1
        k = 0.25
        dr = r[2] - r[1]

        len = size(r)[1]-1
        ur = zeros(eltype(x), len)
        ui = zeros(eltype(x), len)
        a = r[end-2]
        ur[2] = 1e-2
        ui[1] = 1e-3
        ui[2] = 1e-2
        for i in 3:len
            uval = (L*(L+1)*ur[i-1]+ui[i-1]*k)*U[i]
            ur[i] = uval/ur[i-1]
            ui[i] = uval/ui[i-1]
        end
        SL = ur[len] * ui[len]
        return SL
    end

    function ForDiffSL_global(x)
        U = U1*x
        L = 1
        k = 0.25
        dr = r[2] - r[1]

        len = size(r)[1]-1
        ur = zeros(eltype(x), len)
        ui = zeros(eltype(x), len)
        a = r[end-2]
        ur[2] = 1e-2
        ui[1] = 1e-3
        ui[2] = 1e-2
        for i in 3:len
            uval = (L*(L+1)*ur[i-1]+ui[i-1]*k)*U[i]
            ur[i] = uval/ur[i-1]
            ui[i] = uval/ui[i-1]
        end
        SL = ur[len] * ui[len]
        return SL
    end

Results for first call:

ForwardDiff gradient anonymous call:
  1.525661 seconds (3.71 M allocations: 246.695 MiB, 4.84% gc time, 99.49% compilation time)

ForwardDiff gradient global:
  0.967172 seconds (2.52 M allocations: 161.932 MiB, 7.33% gc time, 98.78% compilation time)

Result for second call:

ForwardDiff gradient anonymous call:
  0.920046 seconds (2.18 M allocations: 150.192 MiB, 5.94% gc time, 99.71% compilation time)

ForwardDiff gradient global:
  0.054598 seconds (391.37 k allocations: 17.675 MiB, 81.62% compilation time: 100% of which was recompilation)

I also noticed I didn’t have to declare the variables as global for ForDiffSL_global to work…

If you could take a look at my reply (accidentally replied to myself) I have some follow up questions. Appreciate it!

Possibly you are benchmarking incorrectly. Impossible to tell since you didn’t post runnable code for anyone to reproduce your results.

Good point. Here’s the full code:

begin # Packages
    using Dates
    using BenchmarkTools
    using ForwardDiff
end

    function ForDiffSL(U1, x, r)
        U = U1*x
        L = 1
        k = 0.25
        dr = r[2] - r[1]

        len = size(r)[1]-1
        ur = zeros(eltype(x), len)
        ui = zeros(eltype(x), len)
        a = r[end-2]
        ur[2] = 1e-2
        ui[1] = 1e-3
        ui[2] = 1e-2
        for i in 3:len
            uval = (L*(L+1)*ur[i-1]+ui[i-1]*k)*U[i]
            ur[i] = uval/ur[i-1]
            ui[i] = uval/ui[i-1]
        end
        SL = ur[len] * ui[len]
        return SL
    end

    function ForDiffSL_global(x)
        U = U1*x
        L = 1
        k = 0.25
        dr = r[2] - r[1]

        len = size(r)[1]-1
        ur = zeros(eltype(x), len)
        ui = zeros(eltype(x), len)
        a = r[end-2]
        ur[2] = 1e-2
        ui[1] = 1e-3
        ui[2] = 1e-2
        for i in 3:len
            uval = (L*(L+1)*ur[i-1]+ui[i-1]*k)*U[i]
            ur[i] = uval/ur[i-1]
            ui[i] = uval/ui[i-1]
        end
        SL = ur[len] * ui[len]
        return SL
    end

r = Vector(LinRange(0, 20, 20000))
U1 = rand(size(r,1), 2)
x = rand(2,2)


println("ForwardDiff gradient anonymous call:")
@time begin
    dSL_fd = ForwardDiff.gradient(x -> ForDiffSL(U1, x, r), x)
end
println()

println("ForwardDiff gradient global:")
@time begin
    dSL_fd = ForwardDiff.gradient(ForDiffSL_global, x)
end

You’ll note I don’t have an implementation of the second speedup suggestion (eliminating unneeded vectors) here; I actually implemented that in my main code instead of the MWE. I’ll add it shortly.

Don’t use @time. You are including compilation time and other noise. Use BenchmarkTools.jl or similar:

@btime ForwardDiff.gradient(x -> ForDiffSL($U1, x, $r), $x)

where the $ (a feature of @btime) ensures that you aren’t measuring the penalty of global variables. (In a real performance-critical application, presumably all of this code would be inside a function.)

That did it. Now the anonymous call is about 10x faster. Thanks again!

ForwardDiff gradient anonymous call:
  915.815 μs (18 allocations: 3.07 MiB)

ForwardDiff gradient global:
  7.548 ms (356917 allocations: 15.54 MiB)

You can get another x4 speedup by using StaticArrays (because you’re working with small vectors/matrices whose size you know at compile-time) and avoiding allocations altogether. Here’s an example:

using BenchmarkTools
using LinearAlgebra
using ForwardDiff
using StaticArrays

function ForDiffSL(x, (U1, r))
    U = U1 * x
    L = 1
    k = 0.25
    dr = r[2] - r[1]

    len = size(r)[1] - 1
    ur = zeros(eltype(x), len)
    ui = zeros(eltype(x), len)
    a = r[end - 2]
    ur[2] = 1e-2
    ui[1] = 1e-3
    ui[2] = 1e-2
    for i in 3:len
        uval = (L * (L + 1) * ur[i - 1] + ui[i - 1] * k) * U[i]  # TODO: fix?
        ur[i] = uval / ur[i - 1]
        ui[i] = uval / ui[i - 1]
    end
    SL = ur[len] * ui[len]
    return SL
end

function ForDiffSL_fast(x_static, (U1_static, r))
    L = 1
    k = 0.25
    len = length(r) - 1
    ur = 1e-2
    ui = 1e-2
    for i in 3:len
        uval = (L * (L + 1) * ur + ui * k) * dot(U1_static[i], x_static[:, 1])  # TODO: fix?
        ur = uval / ur
        ui = uval / ui
    end
    SL = ur * ui
    return SL
end

r = Vector(LinRange(0, 20, 20000))
U1 = rand(size(r, 1), 2)
x = rand(2, 2)

U1_static = [SVector(U1[i, 1], U1[i, 2]) for i in axes(U1, 1)]
x_static = SMatrix{2,2}(x[1, 1], x[2, 1], x[1, 2], x[2, 2])

f = Base.Fix2(ForDiffSL, (U1, r))
f_fast = Base.Fix2(ForDiffSL_fast, (U1_static, r))

@assert f(x) == f_fast(x)
julia> @btime ForwardDiff.gradient($f, $x) seconds=1;
  1.397 ms (17 allocations: 3.05 MiB)

julia> @btime ForwardDiff.gradient($f_fast, $x_static) seconds=1;
  401.960 μs (0 allocations: 0 bytes)

I’m not sure about the dot product to replace U[i]. It is semantically equivalent and gives the same results as your original function, but I think you may have made an indexing mistake. Here, U is an n x 2 matrix, and U[i] is not a row but the i-th coefficient of the flattened matrix (columnwise). Is that what you meant to use there?

2 Likes