However, those model rely on delay equations which are often ignored for simplicity but clearly distort the actual value of parameters. I have not been able to find good examples of delay equations.
You may know about this already, but here is a common trick for converting delay differential equations to ODEs for the purposes of numerical approximation/simulation.
Suppose that some agents convert from state A to B after a delay of T. Introduce interim states I_1, I_2, \dots, I_n, with Poisson transitions rates \lambda_0 from A to I_1, \lambda_k from I_k to I_{k+1}, and finally \lambda_n from I_n to B.
Calibrate your parameters so that \sum 1/\lambda_i = T, so the expected time it takes to get from A to B matches. The higher n is, the smaller the variance of the expected time will be (this is easy to check and is a good exercise).
In practice, for a lot of processes, you can get away with n = 2, \dots, 5 and get a good approximation. For a lot of things transitions are not deterministic anyway, so this may be a better model to start with. Eg for COVID-19, you can assume transmission rates increasing in k for the I_k.
The “problem” here is \frac{dx}{dt}. If x is a signal, you need to differentiate it. If x is the state of another ODE, you just eliminate it by replacing it with the right hand side of that ODE.
Interesting to compare those two. A compartment fed with poisson rates acts a bit like an damped integrator, do you typically get these additional derivative terms dx/dt (making x an integrator of that new coordinate as well)?
the problem of differentiating wrt to the delay D in DDE is not well posed mathematically as it affects the state space. By scaling time, you can assume D=1 and multiply your vector field by D. It becomes then possible to differentiate.
Yes, indeed, but if there are multiple delays present, then differentiation of the right-hand side with respect to the remaining (after scaling) delay parameters is still - in general - not possible.
(This actually had me wondering a bit while following this topic. It may be that (perhaps implicitly) the problem is circumvented by using that the history segments gain one degree of smoothness after each time step equal to the maximum delay.)
I think that a finer grid of interim transitions (again, 2–5 as I suggested above) may lead to a better approximation than the one you suggested while remaining very tractable (but this is just intuition, I did not do the math).
Also, I find it intuitive to think about people being in “stages” of infection, between which they transition stochastically. I think that a fixed lag \tau for the population is a convenience assumption; I don’t think the process is deterministic so I would not work very hard to maintain this assumption.
My feeling is that you can do it. The state space is something like C([-D, 0], R) where D is the maximum delay. If the rescaled delays stay below 1, it should be fine for this part.
Then the question of mild/strong solution comes in. Regularity of C0-semigroups wrt to parameters, I have never thought much about this.
I had not thought about it much either, until relatively recently. There is a comment about it in this article in Remark 21, but the issue was well-known before that, see the references in that remark and also the work by Krisztin and Walther on state-dependent DDEs. There the problem becomes even more serious.
It is not my intention to take this too far off-topic, but if you would like to, we can continue the discussion elsewhere.
Note: I don’t intend to suggest using first order approximation for the exponential functions; that’s just an example. Anyway, would your proposal be similar to setting (\tau\ge 0, \tau_i \ge 0):
\exp(\tau s) = \prod_{i=1}^n\exp(\tau_i s)
where \sum_{i=1}^n\tau_i = \tau (e.g., n=4 or n=5), using only first order approximations of \exp(\tau_i s), and then tune the \tau_i to get the best approximation?