The way I see it is that, image acquisition and processing may be part of a real-time control loop, so you need to be able to treat that like the rest of the control system. We usually are trying to solve scientific problems like, given that we see data X, fit a neural ODE to uncover the dynamics, or fit a neural ODE mixed with a real ODE to find a nonlinear optimal control strategy… but this assumes we have data X.

Let’s take it one step further back. Let’s say we have an autonomous vehicle with a few cameras and some laser sensors, and we want to have this follow a target. You have the image processing to pinpoint the `(x,y,z)`

location of the target, and then the control loop to the target, and you want to optimize your strategy to get there, but the generation of `(x,y,z)`

itself is a problem of rendering a 3D image image from the cameras and lasers. So that image generation needs to be a first-class differentiable portion of the training loop, so you can handle this thing as a whole by backproping through the dynamical system control problem (neural ODE), and the image processing / raytracing portion, to go all the way back to from prediction to input data.