Hi everyone,
I’m working on adding automatic differentiation to an optimization framework and running into some fundamental architectural issues. The framework uses a Directed Acyclic Graph modeled after Fig. 7 in this reference:
Each node in the graph contains models (spacecraft, maneuvers), optimization variables (maneuver values, etc.), and functions (constraints, cost functions). Edges connect nodes by modeling the physics between them. I have this implemented in Julia and it converges with SNOW using FiniteDifferences for partials.
Now I’m trying to use Julia’s AD to avoid the inaccurate partials that come from finite differencing. This has proved very challenging, though I’m still relatively new to Julia.
Here’s the overall code flow, simplified to focus on the core issue:
Configure the problem:
Create models (spacecraft, maneuvers, etc.)
1.1 Define DAG: each node is an event, assign models to nodes, define optimization variables and functions
1.2 Build the DAG from nodes and initialize for execution
1.3 Run the problem (optimizer function call):
Run the optimization
2.1 Decompose decision vector and map to DAG nodes (modifying model fields)
2.2 Execute simulation: walk the DAG and execute models, compute cost/constraints
2.3 Assemble function vector and return to optimizer
2.4 Repeat until converged
To use AD in this workflow, I’ve tried several approaches. What I hoped would work was to promote the optimization variable fields to ForwardDiff.Dual in a new step (after users define their models but before building the DAG). This way, when I map decision vector variables that are Duals to the corresponding struct fields, those fields are already Duals and I maintain type stability.
For this approach, I get this error:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 3})
I’ve tried several other approaches using Accessors and other packages. Some won’t run without errors, others just return zero partials.
My question is architectural rather than debugging-focused: How do people handle AD with complex struct-based models? Are there established architecture patterns that work well for this type of problem?
Thanks for any inputs. I have worked hard on this (and I think I broke Claude-Sonnet LOL) but still don’t have a working solution.