Hi!
I’m not an ML / NN expert, so your mileage may vary with my advice (and I am speaking under correction), but since no one has replied yet, I thought I’d make a few suggestions that many would consider “low-hanging fruit”.
- In your original ODE that you solved, does it have any time-dependent function (i.e, a function that depends on time, such as an event?) If that’s the case, I’m in the same boat as you – the topic is here. However, if that’s not the case, there’s still a few things you can try:
- Try a lower learning rate for ADAM once you get to the point you’re at now. I’ve had quite a bit of success by doing the initial learning pass with the default learning rate, and after that initial convergence I can often eek out a substantial amount of improvement by dropping the learning rate by a factor or 10 or so. This will take a bit of experimentation.
- Use a different activation function. I’ve found that
swish
and \sigma work better thanrelu
andtanh
in my personal ODEs that I’ve tried to solve. - Try a different model architecture. From personal experience, bigger isn’t always better. Your model is pretty large for an ODE (at least from what I’ve seen). How complicated is the original system model? That should help inform how expressive your model needs to be. Also consider playing around with different layer sizes. 32 → 16 → 8 → 2, etc. This has achieved good results in certain cases for me, where a model with more parameters struggled somewhat to train.
Hope this helps!