Brief Summary

Problem at hand:
minimise J \left( p \right) = \phi \left( x \left(t_{f} \right) \right) + \int_{t_{0}}^{t_{f}} L \left( \tau, x\left(\tau\right), p \right) d\tau + R\left( p \right)
subject to \dot{x}\left(t\right) = f\left( t, x\left(t\right), p\right)
with x\left(t_{0}\right) = x_{0}, t_{0}, t_{f} fixed,
where p is the parameter for a neural network. 
How can I compute the correct gradient for the above Bolza form loss function (= continuous functional (Lagrange term) + terminal cost (Mayer term) + neural network parameterdependent function ) through continuous adjoint sensitivity analysis using DiffEqSensitivity or DiffEqFlux?

Is there any recommendation about the way of constructing the Bolza form loss function, e.g., including \dot{J}_{cont}\left( t \right) = L\left( t, x\left(t\right), p \right) in the forward pass?
More Details
Hello. My name is Namhoon Cho.
I was trying to solve a nonlinear deterministic optimal control problem in continuoustime domain by optimising a neural feedback policy. The context of my application requires specifying the terminal cost to be a function of final state in the form \phi \left( x \left(t_{f}\right) \right) besides the continuous functional.
I have been following the codebase developed for models combining ODE and NNs.
At the first glance, I tried to just specify the loss function in such mixed form and apply DiffEqFlux.sciml_train (or others) for training.
However, I soon found out that the function adjoint_sensitivities() sitting inside DiffEqSensitivity.jl may not handle the loss function of a mixed type (continuous functional + terminal cost + parameteronly function,…) natively, which was also pointed out in samuela/ctpg. I read the description in Mathematics of Sensitivity Analysis · DifferentialEquations.jl (sciml.ai), and the continuous adjoint sensitivity analysis described in CVODES document, but it is still not clear from my side how the current SciML packages handle the loss function term given by an evaluation at the final time.
Then, I became afraid that the result I obtain might not be based on the mathematically correct gradient \frac{dJ}{dp} if I naively apply the existing package without knowing the details.
Could anyone please clarify the thing whether I can apply the tools with no worries about correct \frac{dJ}{dp} calculation?
Thank you.