I am interested in implementing training universal partial differential equation (UPDE) systems.
I understand from @ChrisRackauckas that the SciML ecosystem is increasingly moving toward Lux and Enzyme. (And which optimizers for training? A mix of ADAM and BFGS, as in the SciML “missing physics” tutorial?)
I am not sure what the current best practices are for implementing even universal ordinary differential equations using these packages. Some existing tutorials I’ve found have gaps (they don’t apply straightforwardly to UDEs, or use they use Flux, or Zygote, or …)
Does anyone have a minimal working example of a UDE using modern Julia SciML best practices (preferably with mutating state vectors on the right hand side? Ideally this would show how to implement (1) the gradient of a loss (e.g. MSE on final state) with respect to Lux network parameters (or for that matter with respect to initial conditions or forcings); and (2) a full training loop?
Furthermore, is it currently possible to do this by (1) differentiating through an explicit for-loop timestepper (e.g. naive Euler scheme); and (2) using continuous adjoints from SciMLSensitivity?
(If any of this will also run on the GPU that would be even better, but I’m currently prototyping on the CPU.)
If I can get these running, I will likely have followup questions about UPDEs.
Isn’t the first example in the showcase an example of this?
Let me know if anything is missing. @avikpal recently updated that example so it should be fine?
If you write your own explicit non-adaptive method it should work fine with Enzyme. Note that v1.10 works with Enzyme, but there is something with the v1.11 update I’m working through.
You can also differentiate through the solvers using ReverseDiff.jl or Tracker.jl.
To clarify, I was looking for examples of Lux and Enzyme working together with an optimizer to train a UDE system. I don’t think any of the existing examples have that combination?
It would also be useful to see examples of how to do it with both hand-written for loops as well as the algorithm of choice from SciMLSensitivity, as the latter can be a little automagical.
If some other combination (Lux and Zygote, or Flux and Enzyme, etc.) would be better, I could go in a different direction, but I’m looking for the most future-proof solution that isn’t going to obsolete a codebase.
Most of the documentation examples are using that combination, otherwise the mutation of the in-place definition of the ODE solver wouldn’t be supported. The vjp choice is just automatic so it’s just not seen by the user.
The newest docs now are all Optimization v4, Lux v1, Julia v1.11, etc. Enzyme is now v1.11 compatible, though I found that it was missing a rule on one of the new Memory handling intrinsics (Inactive declaration ignored · Issue #2016 · EnzymeAD/Enzyme.jl · GitHub). I have Billy working with the compiler crew and that should be all that is left there.
[Sorry the v1.11 bumps were a bit delayed because this time around Julia changed its internal memory handling (https://www.youtube.com/watch?v=L6BFQ1d8xNs) which required quite a lot more of an Enzyme update than usual]
With all of that together, I’ll write a neural network in PDE tutorial. Basically an update of the one that’s already in the UDE paper, but the paper code is from 2020 so I’ll create a new version that is using all of the latest tooling and add that to SciMLSensitivity.jl’s docs. Expect that by mid next week?
Once that’s together, I think you’re off to the races?