Minimisation of a Bolza form loss function in DiffEqFlux framework

Brief Summary

  • Problem at hand:
    minimise J \left( p \right) = \phi \left( x \left(t_{f} \right) \right) + \int_{t_{0}}^{t_{f}} L \left( \tau, x\left(\tau\right), p \right) d\tau + R\left( p \right)
    subject to \dot{x}\left(t\right) = f\left( t, x\left(t\right), p\right)
    with x\left(t_{0}\right) = x_{0}, t_{0}, t_{f} fixed,
    where p is the parameter for a neural network.

  • How can I compute the correct gradient for the above Bolza form loss function (= continuous functional (Lagrange term) + terminal cost (Mayer term) + neural network parameter-dependent function ) through continuous adjoint sensitivity analysis using DiffEqSensitivity or DiffEqFlux?

  • Is there any recommendation about the way of constructing the Bolza form loss function, e.g., including \dot{J}_{cont}\left( t \right) = L\left( t, x\left(t\right), p \right) in the forward pass?

More Details

Hello. My name is Namhoon Cho.

I was trying to solve a nonlinear deterministic optimal control problem in continuous-time domain by optimising a neural feedback policy. The context of my application requires specifying the terminal cost to be a function of final state in the form \phi \left( x \left(t_{f}\right) \right) besides the continuous functional.

I have been following the codebase developed for models combining ODE and NNs.
At the first glance, I tried to just specify the loss function in such mixed form and apply DiffEqFlux.sciml_train (or others) for training.

However, I soon found out that the function adjoint_sensitivities() sitting inside DiffEqSensitivity.jl may not handle the loss function of a mixed type (continuous functional + terminal cost + parameter-only function,…) natively, which was also pointed out in samuela/ctpg. I read the description in Mathematics of Sensitivity Analysis · DifferentialEquations.jl (, and the continuous adjoint sensitivity analysis described in CVODES document, but it is still not clear from my side how the current SciML packages handle the loss function term given by an evaluation at the final time.

Then, I became afraid that the result I obtain might not be based on the mathematically correct gradient \frac{dJ}{dp} if I naively apply the existing package without knowing the details.

Could anyone please clarify the thing whether I can apply the tools with no worries about correct \frac{dJ}{dp} calculation?

Thank you.


Open an issue. That would be easy to add.

Thank you Christopher for your prompt response! I will open up a new issue soon.

For convenience, I leave a backlink for the issue.
Also, I’ll add an answer to the issue, not here.