Using Gradient Function from ForwardDiff with Structures

I am trying to use the gradient function from the ForwardDiff package with a function that has an input that is a structure. The issue that I am having is that I am unable to figure out how to have my function handle the data types that are used by the gradient function. The code that I am working with is the following:


using DifferentialEquations



# Sparse Multiplication Structure



mutable struct MULTIBODYSTRUCTURE
    REALTIME::Array{Float64,1}
end

times = [
    1.79166673784026e-05,
    0.0632679166673785,
    0.127267916667379,
    0.191267916667379,
    0.255267916667379,
    0.319267916667379,
    0.383267916667379,
    0.447267916667379,
    0.511267916667379,
    0.575267916667379,
    0.639267916667379,
    0.703267916667379,
    0.767267916667379,
    0.831267916667379,
    0.895267916667379,
    0.961267916667379,
    0.993267916667379,
]




M = MULTIBODYSTRUCTURE(
    times,
)


function CostFunction(x::AbstractVector{T}) where T
        M.REALTIME = x

totalCost = M.REALTIME .+ 5.0 .* M.REALTIME
return totalCost
end

xin = [0.78, 0.56]


g = ForwardDiff.gradient(CostFunction, xin)

There are two issues:

  1. To allow ForwardDiff to use Duals, you could construct M inside the CostFunction and define MULTIBODYSTRUCTURE parametric, i.e.
mutable struct MULTIBODYSTRUCTURE{T}
    REALTIME::Array{T,1}
end
function CostFunction(x::AbstractVector{T}) where T
    M = MULTIBODYSTRUCTURE(x)

    totalCost = M.REALTIME .+ 5.0 .* M.REALTIME
    return totalCost
end
  1. Your CostFunction is vector-valued but the gradient is only defined on real-valued functions. Did you want to use Forward.jacobian instead?

Btw., generally I find it useful to follow Julia’s naming conventions, but that’s of course totally up to you :wink:.

2 Likes

Would this work on a structure that has multiple kinds of datatypes like the one shown below?

mutable struct OTHER
    VAL::Array{Float64,1}
    OTHER::String
end
mutable struct MULTIBODYSTRUCTURE
    REALTIME::Array{Float64,1}
    OTH::OTHER
    CONST::Int64
end

Another wrinkle that I might have is that I also only want the gradient to be evaluated on certain elements of the structure. For example I want the gradient just of the variables VAL, and REALTIME but not the rest of the variables. To me this does not seem possible when using this call M = MULTIBODYSTRUCTURE(x).

I am a little confused by what you meant in your second point by CostFunction being vector-valued. I have to use the gradient function because I am using the gradient for several gradient based optimization techniques.

Also sorry about the weird formatting I was translating some elses code and used the formatting that they had used. Also thank you so much for your help

Would this work on a structure that has multiple kinds of datatypes like the one shown below?

Yes, e.g.

mutable struct OTHER{T}
    VAL::Array{T,1}
    OTHER::String
end
mutable struct MULTIBODYSTRUCTURE{T}
    REALTIME::Array{T,1}
    OTH::OTHER{T}
    CONST::Int64
end
function MULTIBODYSTRUCTURE(x::AbstractVector{T}) where T
    MULTIBODYSTRUCTURE{T}(x[1:4], OTHER{T}(x[5:end], "some string"), 12345)
end
function CostFunction(x::AbstractVector{T}) where T
    M = MULTIBODYSTRUCTURE(x)

    totalCost = sum(6 * M.REALTIME) + sum(abs2, M.OTH.VAL)
    return totalCost
end
using ForwardDiff
ForwardDiff.gradient(CostFunction, rand(12))

I am a little confused by what you meant in your second point by CostFunction being vector-valued. I have to use the gradient function because I am using the gradient for several gradient based optimization techniques.

I guess your example in the original post was not the real cost function you wanted to use. At the risk of stating the obvious, in optimization you typically use a scalar-valued function f, because it makes sense to speak of f([1, 2]) > f([2, 3]) if f([1, 2]) = 2 and f([2, 3]) = 1, but it unclear how to decide the order for vector valued-functions; e.g. which value is larger: f([1, 2]) = [2, 0] or f([2, 3]) = [1, 4]?
Gradients are only defined for scalar-valued functions. But your cost function in the original post is vector-valued, because CostFunction([0.78, 0.56]) = [4.68, 3.36].

2 Likes