Need help using ForwardDiff gradient with numerical loop

dameka · November 6, 2024, 7:43pm

I’m testing ways to get the gradient of a function which at its core is a numerical loop. I’m trying to use ForwardDiff currently to see if it’s any faster than Zygote reverse diff. I can’t get ForwardDiff to perform AD on this MWE; could someone help me figure out what’s wrong with my implementation?

function ForDiffSL(x)
        U = U1*x
        L = 1
        k = 0.25
        dr = r[2] - r[1]

        len = size(r)[1]-1
        ur = zeros(len)
        ui = zeros(len)
        a = r[end-2]
        ur[2] = 1e-2
        ui[1] = 1e-3
        ui[2] = 1e-2
        for i in 3:len
            uval = (L*(L+1)*ur[i-1]+ui[i-1]*k)*U[i]
            ur[i] = uval/ur[i-1]
            ui[i] = uval/ui[i-1]
        end
        SL = ur[len] * ui[len]
        return SL
end

global r = Vector(LinRange(0, 20, 20000))

global U1 = rand(size(r,1), 2)

x = rand(2,2)

dSL_fd = ForwardDiff.gradient(ForDiffSL, x)

I get the following error message when running this code:

ERROR: LoadError: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4})

Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat
   @ Base rounding.jl:207
  (::Type{T})(::T) where T<:Number
   @ Core boot.jl:792
  (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number}
   @ Base char.jl:50
  ...

Stacktrace:
  [1] convert(#unused#::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4})
    @ Base ./number.jl:7
  [2] setindex!(A::Vector{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4}, i1::Int64)
    @ Base ./array.jl:969
  [3] ForDiffSL(x::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4}})
    @ Main ~/nuclear-diffprog/MWEs/mwe_coreloop.jl:49
  [4] vector_mode_dual_eval!
    @ ~/.julia/packages/ForwardDiff/onEFt/src/apiutils.jl:24 [inlined]
  [5] vector_mode_gradient(f::typeof(ForDiffSL), x::Matrix{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4}}})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/onEFt/src/gradient.jl:91
  [6] gradient
    @ ~/.julia/packages/ForwardDiff/onEFt/src/gradient.jl:20 [inlined]
  [7] gradient(f::typeof(ForDiffSL), x::Matrix{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ForDiffSL), Float64}, Float64, 4}}})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/onEFt/src/gradient.jl:17
  [8] gradient(f::typeof(ForDiffSL), x::Matrix{Float64})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/onEFt/src/gradient.jl:17
  [9] macro expansion
    @ ~/nuclear-diffprog/MWEs/mwe_coreloop.jl:81 [inlined]
 [10] top-level scope
    @ ./timing.jl:273
 [11] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [12] top-level scope
    @ REPL[1]:1
in expression starting at mwe_coreloop.jl:80

bertschi · November 6, 2024, 8:13pm

Welcome, unfortunately, you ran into a common problem when using auto-diff. In order to run forward diff on your function, it’s types need to be general enough as forward diff is implemented via a special dual number type.
Now, using zeros(len) will allocate a vector typed Float64. Instead, use zeros(eltype(x), len) to match the type of the input used during forward diff.

Further, I would suggest to pass the globals r and U1 as explicit arguments to ForDiffSL. When these should not be included in the gradient calculation, you can pass an anonymous function, i.e., ForwardDiff.gradient(x -> ForDiffSL(x, r, U1), x).

gdalle · November 6, 2024, 8:23pm

Welcome to the forum!
To make life easier for people trying to help you, it’s very useful to format the code with backticks, like so:

```julia
f(x) = 1
```

It seems that you tried but perhaps you used the wrong character? See this post for other tips on how to use the forum:

stevengj · November 6, 2024, 9:58pm

(I edited the post to fix it from the wrong character ''' to the correct character ```. Note that the julia suffix is not necessary in this forum, where it is the default.)

gdalle · November 7, 2024, 6:00am

This will probably speed up your function and its gradient by quite a lot. If you want even more speed, you can probably avoid allocating ui and ur inside the function, because all you need are the last terms anyway so it’s enough to store a number for each and update it iteratively.

dameka · November 7, 2024, 8:17pm

I really appreciate the help. I got ForwardDiff working with your example, and it’s much faster than Zygote for my problem!

However, I tried two of the suggestions here for speeding up code: passing an anonymous function to FD.gradient (avoiding global variables) and replacing the long vectors ur and ui with short ones that I overwrite. Both of these suggestions actually slowed my code down, especially passing the anonymous function (20x slower). Could you help me understand why that might be? I notice I only get speedup on the global version relative to the anonymous on the second and subsequent compilation/execution of the code; does this explain the difference?

In my real code I’m passing an anonymous version of the loss function to Zygote.gradient() and I suspect this is slowing me down a lot.

function ForDiffSL(U1, x, r)
        U = U1*x
        L = 1
        k = 0.25
        dr = r[2] - r[1]

        len = size(r)[1]-1
        ur = zeros(eltype(x), len)
        ui = zeros(eltype(x), len)
        a = r[end-2]
        ur[2] = 1e-2
        ui[1] = 1e-3
        ui[2] = 1e-2
        for i in 3:len
            uval = (L*(L+1)*ur[i-1]+ui[i-1]*k)*U[i]
            ur[i] = uval/ur[i-1]
            ui[i] = uval/ui[i-1]
        end
        SL = ur[len] * ui[len]
        return SL
    end

    function ForDiffSL_global(x)
        U = U1*x
        L = 1
        k = 0.25
        dr = r[2] - r[1]

        len = size(r)[1]-1
        ur = zeros(eltype(x), len)
        ui = zeros(eltype(x), len)
        a = r[end-2]
        ur[2] = 1e-2
        ui[1] = 1e-3
        ui[2] = 1e-2
        for i in 3:len
            uval = (L*(L+1)*ur[i-1]+ui[i-1]*k)*U[i]
            ur[i] = uval/ur[i-1]
            ui[i] = uval/ui[i-1]
        end
        SL = ur[len] * ui[len]
        return SL
    end

Results for first call:

ForwardDiff gradient anonymous call:
  1.525661 seconds (3.71 M allocations: 246.695 MiB, 4.84% gc time, 99.49% compilation time)

ForwardDiff gradient global:
  0.967172 seconds (2.52 M allocations: 161.932 MiB, 7.33% gc time, 98.78% compilation time)

Result for second call:

ForwardDiff gradient anonymous call:
  0.920046 seconds (2.18 M allocations: 150.192 MiB, 5.94% gc time, 99.71% compilation time)

ForwardDiff gradient global:
  0.054598 seconds (391.37 k allocations: 17.675 MiB, 81.62% compilation time: 100% of which was recompilation)

I also noticed I didn’t have to declare the variables as global for ForDiffSL_global to work…

dameka · November 7, 2024, 8:20pm

If you could take a look at my reply (accidentally replied to myself) I have some follow up questions. Appreciate it!

stevengj · November 7, 2024, 8:23pm

Possibly you are benchmarking incorrectly. Impossible to tell since you didn’t post runnable code for anyone to reproduce your results.

dameka · November 7, 2024, 8:37pm

Good point. Here’s the full code:

begin # Packages
    using Dates
    using BenchmarkTools
    using ForwardDiff
end

    function ForDiffSL(U1, x, r)
        U = U1*x
        L = 1
        k = 0.25
        dr = r[2] - r[1]

        len = size(r)[1]-1
        ur = zeros(eltype(x), len)
        ui = zeros(eltype(x), len)
        a = r[end-2]
        ur[2] = 1e-2
        ui[1] = 1e-3
        ui[2] = 1e-2
        for i in 3:len
            uval = (L*(L+1)*ur[i-1]+ui[i-1]*k)*U[i]
            ur[i] = uval/ur[i-1]
            ui[i] = uval/ui[i-1]
        end
        SL = ur[len] * ui[len]
        return SL
    end

    function ForDiffSL_global(x)
        U = U1*x
        L = 1
        k = 0.25
        dr = r[2] - r[1]

        len = size(r)[1]-1
        ur = zeros(eltype(x), len)
        ui = zeros(eltype(x), len)
        a = r[end-2]
        ur[2] = 1e-2
        ui[1] = 1e-3
        ui[2] = 1e-2
        for i in 3:len
            uval = (L*(L+1)*ur[i-1]+ui[i-1]*k)*U[i]
            ur[i] = uval/ur[i-1]
            ui[i] = uval/ui[i-1]
        end
        SL = ur[len] * ui[len]
        return SL
    end

r = Vector(LinRange(0, 20, 20000))
U1 = rand(size(r,1), 2)
x = rand(2,2)


println("ForwardDiff gradient anonymous call:")
@time begin
    dSL_fd = ForwardDiff.gradient(x -> ForDiffSL(U1, x, r), x)
end
println()

println("ForwardDiff gradient global:")
@time begin
    dSL_fd = ForwardDiff.gradient(ForDiffSL_global, x)
end

You’ll note I don’t have an implementation of the second speedup suggestion (eliminating unneeded vectors) here; I actually implemented that in my main code instead of the MWE. I’ll add it shortly.

stevengj · November 7, 2024, 8:39pm

Don’t use @time. You are including compilation time and other noise. Use BenchmarkTools.jl or similar:

@btime ForwardDiff.gradient(x -> ForDiffSL($U1, x, $r), $x)

where the $ (a feature of @btime) ensures that you aren’t measuring the penalty of global variables. (In a real performance-critical application, presumably all of this code would be inside a function.)

dameka · November 7, 2024, 8:46pm

That did it. Now the anonymous call is about 10x faster. Thanks again!

ForwardDiff gradient anonymous call:
  915.815 μs (18 allocations: 3.07 MiB)

ForwardDiff gradient global:
  7.548 ms (356917 allocations: 15.54 MiB)

gdalle · November 7, 2024, 10:55pm

You can get another x4 speedup by using StaticArrays (because you’re working with small vectors/matrices whose size you know at compile-time) and avoiding allocations altogether. Here’s an example:

using BenchmarkTools
using LinearAlgebra
using ForwardDiff
using StaticArrays

function ForDiffSL(x, (U1, r))
    U = U1 * x
    L = 1
    k = 0.25
    dr = r[2] - r[1]

    len = size(r)[1] - 1
    ur = zeros(eltype(x), len)
    ui = zeros(eltype(x), len)
    a = r[end - 2]
    ur[2] = 1e-2
    ui[1] = 1e-3
    ui[2] = 1e-2
    for i in 3:len
        uval = (L * (L + 1) * ur[i - 1] + ui[i - 1] * k) * U[i]  # TODO: fix?
        ur[i] = uval / ur[i - 1]
        ui[i] = uval / ui[i - 1]
    end
    SL = ur[len] * ui[len]
    return SL
end

function ForDiffSL_fast(x_static, (U1_static, r))
    L = 1
    k = 0.25
    len = length(r) - 1
    ur = 1e-2
    ui = 1e-2
    for i in 3:len
        uval = (L * (L + 1) * ur + ui * k) * dot(U1_static[i], x_static[:, 1])  # TODO: fix?
        ur = uval / ur
        ui = uval / ui
    end
    SL = ur * ui
    return SL
end

r = Vector(LinRange(0, 20, 20000))
U1 = rand(size(r, 1), 2)
x = rand(2, 2)

U1_static = [SVector(U1[i, 1], U1[i, 2]) for i in axes(U1, 1)]
x_static = SMatrix{2,2}(x[1, 1], x[2, 1], x[1, 2], x[2, 2])

f = Base.Fix2(ForDiffSL, (U1, r))
f_fast = Base.Fix2(ForDiffSL_fast, (U1_static, r))

@assert f(x) == f_fast(x)

julia> @btime ForwardDiff.gradient($f, $x) seconds=1;
  1.397 ms (17 allocations: 3.05 MiB)

julia> @btime ForwardDiff.gradient($f_fast, $x_static) seconds=1;
  401.960 μs (0 allocations: 0 bytes)

I’m not sure about the dot product to replace U[i]. It is semantically equivalent and gives the same results as your original function, but I think you may have made an indexing mistake. Here, U is an n x 2 matrix, and U[i] is not a row but the i-th coefficient of the flattened matrix (columnwise). Is that what you meant to use there?

Topic		Replies	Views
Vectorized gradient using ForwardDiff Specific Domains forwarddiff	5	424	July 17, 2022
Error using ForwardDiff Numerics	1	596	August 11, 2017
Automatic Differentiation Slow (Slower than Finite-Differences) Optimization (Mathematical)	18	3470	May 31, 2018
Types and gradients, including Forward.gradient General Usage	20	5529	December 20, 2016
Problem with ForardDiff New to Julia package , forwarddiff	1	253	June 8, 2022

Need help using ForwardDiff gradient with numerical loop

Related topics