I am trying to use the gradient function from the ForwardDiff package with a function that has an input that is a structure. The issue that I am having is that I am unable to figure out how to have my function handle the data types that are used by the gradient function. The code that I am working with is the following:
using DifferentialEquations
# Sparse Multiplication Structure
mutable struct MULTIBODYSTRUCTURE
REALTIME::Array{Float64,1}
end
times = [
1.79166673784026e-05,
0.0632679166673785,
0.127267916667379,
0.191267916667379,
0.255267916667379,
0.319267916667379,
0.383267916667379,
0.447267916667379,
0.511267916667379,
0.575267916667379,
0.639267916667379,
0.703267916667379,
0.767267916667379,
0.831267916667379,
0.895267916667379,
0.961267916667379,
0.993267916667379,
]
M = MULTIBODYSTRUCTURE(
times,
)
function CostFunction(x::AbstractVector{T}) where T
M.REALTIME = x
totalCost = M.REALTIME .+ 5.0 .* M.REALTIME
return totalCost
end
xin = [0.78, 0.56]
g = ForwardDiff.gradient(CostFunction, xin)
To allow ForwardDiff to use Duals, you could construct M inside the CostFunction and define MULTIBODYSTRUCTURE parametric, i.e.
mutable struct MULTIBODYSTRUCTURE{T}
REALTIME::Array{T,1}
end
function CostFunction(x::AbstractVector{T}) where T
M = MULTIBODYSTRUCTURE(x)
totalCost = M.REALTIME .+ 5.0 .* M.REALTIME
return totalCost
end
Your CostFunction is vector-valued but the gradient is only defined on real-valued functions. Did you want to use Forward.jacobian instead?
Btw., generally I find it useful to follow Julia’s naming conventions, but that’s of course totally up to you .
Would this work on a structure that has multiple kinds of datatypes like the one shown below?
mutable struct OTHER
VAL::Array{Float64,1}
OTHER::String
end
mutable struct MULTIBODYSTRUCTURE
REALTIME::Array{Float64,1}
OTH::OTHER
CONST::Int64
end
Another wrinkle that I might have is that I also only want the gradient to be evaluated on certain elements of the structure. For example I want the gradient just of the variables VAL, and REALTIME but not the rest of the variables. To me this does not seem possible when using this call M = MULTIBODYSTRUCTURE(x).
I am a little confused by what you meant in your second point by CostFunction being vector-valued. I have to use the gradient function because I am using the gradient for several gradient based optimization techniques.
Also sorry about the weird formatting I was translating some elses code and used the formatting that they had used. Also thank you so much for your help
Would this work on a structure that has multiple kinds of datatypes like the one shown below?
Yes, e.g.
mutable struct OTHER{T}
VAL::Array{T,1}
OTHER::String
end
mutable struct MULTIBODYSTRUCTURE{T}
REALTIME::Array{T,1}
OTH::OTHER{T}
CONST::Int64
end
function MULTIBODYSTRUCTURE(x::AbstractVector{T}) where T
MULTIBODYSTRUCTURE{T}(x[1:4], OTHER{T}(x[5:end], "some string"), 12345)
end
function CostFunction(x::AbstractVector{T}) where T
M = MULTIBODYSTRUCTURE(x)
totalCost = sum(6 * M.REALTIME) + sum(abs2, M.OTH.VAL)
return totalCost
end
using ForwardDiff
ForwardDiff.gradient(CostFunction, rand(12))
I am a little confused by what you meant in your second point by CostFunction being vector-valued. I have to use the gradient function because I am using the gradient for several gradient based optimization techniques.
I guess your example in the original post was not the real cost function you wanted to use. At the risk of stating the obvious, in optimization you typically use a scalar-valued function f, because it makes sense to speak of f([1, 2]) > f([2, 3]) if f([1, 2]) = 2 and f([2, 3]) = 1, but it unclear how to decide the order for vector valued-functions; e.g. which value is larger: f([1, 2]) = [2, 0] or f([2, 3]) = [1, 4]?
Gradients are only defined for scalar-valued functions. But your cost function in the original post is vector-valued, because CostFunction([0.78, 0.56]) = [4.68, 3.36].