Minimisation of a Bolza form loss function in DiffEqFlux framework

nhcho91 · November 22, 2021, 10:51pm

Brief Summary

Problem at hand:
minimise J \left( p \right) = \phi \left( x \left(t_{f} \right) \right) + \int_{t_{0}}^{t_{f}} L \left( \tau, x\left(\tau\right), p \right) d\tau + R\left( p \right)
subject to \dot{x}\left(t\right) = f\left( t, x\left(t\right), p\right)
with x\left(t_{0}\right) = x_{0}, t_{0}, t_{f} fixed,
where p is the parameter for a neural network.
How can I compute the correct gradient for the above Bolza form loss function (= continuous functional (Lagrange term) + terminal cost (Mayer term) + neural network parameter-dependent function ) through continuous adjoint sensitivity analysis using DiffEqSensitivity or DiffEqFlux?
Is there any recommendation about the way of constructing the Bolza form loss function, e.g., including \dot{J}_{cont}\left( t \right) = L\left( t, x\left(t\right), p \right) in the forward pass?

More Details

Hello. My name is Namhoon Cho.

I was trying to solve a nonlinear deterministic optimal control problem in continuous-time domain by optimising a neural feedback policy. The context of my application requires specifying the terminal cost to be a function of final state in the form \phi \left( x \left(t_{f}\right) \right) besides the continuous functional.

I have been following the codebase developed for models combining ODE and NNs.
At the first glance, I tried to just specify the loss function in such mixed form and apply DiffEqFlux.sciml_train (or others) for training.

However, I soon found out that the function adjoint_sensitivities() sitting inside DiffEqSensitivity.jl may not handle the loss function of a mixed type (continuous functional + terminal cost + parameter-only function,…) natively, which was also pointed out in samuela/ctpg. I read the description in Mathematics of Sensitivity Analysis · DifferentialEquations.jl (sciml.ai), and the continuous adjoint sensitivity analysis described in CVODES document, but it is still not clear from my side how the current SciML packages handle the loss function term given by an evaluation at the final time.

Then, I became afraid that the result I obtain might not be based on the mathematically correct gradient \frac{dJ}{dp} if I naively apply the existing package without knowing the details.

Could anyone please clarify the thing whether I can apply the tools with no worries about correct \frac{dJ}{dp} calculation?

Thank you.

ChrisRackauckas · November 22, 2021, 11:06pm

Open an issue. That would be easy to add.

nhcho91 · November 22, 2021, 11:17pm

Thank you Christopher for your prompt response! I will open up a new issue soon.

iHany · November 23, 2021, 8:07am

For convenience, I leave a backlink for the issue.
Also, I’ll add an answer to the issue, not here.

Topic		Replies	Views
DiffEqFlux Autodifferentiating inside loss function Modelling & Simulations question , diffeq , sciml	6	603	September 29, 2020
Custom L2Loss in DiffEqParamEstim.jl Optimization (Mathematical) question	2	218	May 23, 2024
DiffEqFlux: an error when using NN's that take both time and position as input Machine Learning diffeq , neural-network	6	984	May 11, 2021
DiffEqFlux / diffeq_adjoint: no method matching back!(::Float64) Machine Learning diffeq	3	1142	May 28, 2019
DDE parameter estimation using DiffEqFlux New to Julia diffeq , flux	4	600	May 2, 2021

Minimisation of a Bolza form loss function in DiffEqFlux framework

Brief Summary

More Details

Related topics