Optimization with AD of complex struct-based architecture

Hi everyone,

I’m working on adding automatic differentiation to an optimization framework and running into some fundamental architectural issues. The framework uses a Directed Acyclic Graph modeled after Fig. 7 in this reference:

Each node in the graph contains models (spacecraft, maneuvers), optimization variables (maneuver values, etc.), and functions (constraints, cost functions). Edges connect nodes by modeling the physics between them. I have this implemented in Julia and it converges with SNOW using FiniteDifferences for partials.

Now I’m trying to use Julia’s AD to avoid the inaccurate partials that come from finite differencing. This has proved very challenging, though I’m still relatively new to Julia.

Here’s the overall code flow, simplified to focus on the core issue:

Configure the problem:

Create models (spacecraft, maneuvers, etc.)
1.1 Define DAG: each node is an event, assign models to nodes, define optimization variables and functions
1.2 Build the DAG from nodes and initialize for execution
1.3 Run the problem (optimizer function call):

Run the optimization
2.1 Decompose decision vector and map to DAG nodes (modifying model fields)
2.2 Execute simulation: walk the DAG and execute models, compute cost/constraints
2.3 Assemble function vector and return to optimizer
2.4 Repeat until converged

To use AD in this workflow, I’ve tried several approaches. What I hoped would work was to promote the optimization variable fields to ForwardDiff.Dual in a new step (after users define their models but before building the DAG). This way, when I map decision vector variables that are Duals to the corresponding struct fields, those fields are already Duals and I maintain type stability.

For this approach, I get this error:

MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 3})

I’ve tried several other approaches using Accessors and other packages. Some won’t run without errors, others just return zero partials.

My question is architectural rather than debugging-focused: How do people handle AD with complex struct-based models? Are there established architecture patterns that work well for this type of problem?

Thanks for any inputs. I have worked hard on this (and I think I broke Claude-Sonnet LOL) but still don’t have a working solution.

When AD is on mind we try to make all struct accept Number instead of Float64 on differentiated variable

struct Data{T<:Number}
    x::Vector{T}
end

So that ForwardDiff.jl Dual Number can pass in and we use PreallocationTools.jl to make Dual aware cache.

function f(x,cache)
   y = get_tmp(cache,x)
   y[1,:] .=x
   y[2,:] .= x.^2 
   return sum(y)
end 
x= rand(100)
cache = DiffCache(zeros(2,100))
f(x,cache)

we also now have Enzyme.jl and Mooncake.jl that can sometimes bypass those issues they also can randomly crash (and do not work on 1.12 yet) so be careful.

1 Like

@yolhan_mannes Thanks, I am using Real but may change to Number. Real supports Dual and Bigfloat, which works for ForwardDiff and Zygote respectively. I prefer to test any type I allow, and Number allows imaginary etc. which I don’t have time to test yet.

Interestingly, after spending a few months tinkering with this off and on with no working solution, and posting just earlier today in exasperation, I finally got it working this afternoon! A long car drive gave me time to just think.

The problem had a challenge that was subtle to solve. I have a struct like this,

struct Sat{T<:Number}
    x::T
    y::T
    z::T
    vx::T
    vy::T
    vz::T
    m::T
end

The optimization problem varies vx,vy, and vz directly, but m changes indirectly. So even though the problem variables and constraints do not contain m, m is changed under the hood (a maneuver changes the mass). I was aware of that, and I was trying to promote the field m “on my own” by casting it as a ForwardDiff.Dual at construction. This does not work, the tags in my manual cast are not consistent with the tags ForwardDiff uses on the other variables.

What does work, is to augment the simulation state under the hood (kinda like Lagrange multipliers in classical optimal control) to contain all variables that are directly or indirectly affected by the optimizer decision variables X_ad = [X_opt; X_indirect] where X_opt is the decision vector in the optmization problem, and X_indirect are states or quantities that are indirectly affected by X_opt (in this case spacecraft mass). Then call ForwardDiff.jacobian(myfunc, X_ad) works beautifully. I have some work to do to automatically bookkeep this augmented state under the hood and extract only the Jacobian elements in the optimization problem to return to the optimizer, but that is not too bad.

1 Like

You could also give your struct different type parameters for the fields that appear at different spots in your routine (e.g. m::T2)