Developing annotation standards for sciml to support reproduceability of published work

We are building ModelingToolkit towards exactly this. It’s an implementation of a symbolic modeling language with parsers from Bionetgen and CellML (and SBML coming very soon, along with integration with Modia so Modelica models can be used as well), and it extends these systems to other model types and allows for automatic combination and compiler transformations on the model form. The way the open compiler system requires it essentially has a spec, because we follow an LLVM style where valid transformations are functions from a valid ODESystem to a valid ODESystem, so “valid ODESystem” needs to be well-defined so all of the tools can compose (for example, https://mtk.sciml.ai/dev/tutorials/higher_order/ is an example of such a pass). This is still somewhat early in the process, but this is something that we are standardizing and we have libraries being written in many domains

  • Power systems
  • Quantitative systems pharmacology
  • Systems biology
  • Pharmacokinetics/pharmacodynamics
  • Systems neuroscience
  • HVAC and building simulations
  • Electrical circuits are coming soon.

Recent developments also include nonlinear optimization and nonlinear (stochastic) optimal control in this representation, with all of the free performance improvements and parallelism coming from its compiler.

JuliaComputing is building acceleration tools on top of this stack as premium accelerator passes:

Pumas.ai 's Pumas.jl is built on this modeling system. And lastly, we can apply this symbolic system to many pure Julia codes:

To finalize all of this, we need ways to represent full machine learning models inside of this system, which really just covered by the ability to register arbitrary Julia functions as nodes into the computational graph and then allow array variables (instead of just arrays of variables which we do now, so struct of array instead of array of structs in the symbolic sense), which is something @shashi is working on now.

The final product is a both a pure Julia Computer Algebra System (CAS) and a modeling system. The reason is because the two go together: making it easy for users to transform their models into numerically better forms (for example, log transforming variables) means compiler passes on ODE representations, and the easiest way to write such passes is to have everything built on a good, robust, fast, and expressive symbolic system.

It may sound like a lot, but there’s buy-in from many different academic groups and companies and we’ve already demonstrated enough process that I am confident to say that this will be a reality in the next 2 years. And a lot of it is already available at https://mtk.sciml.ai/dev/

Finally, answering the “standards” or “specification”, I think it would be easiest to just say that Julia symbolic script is the specification, since JSON/XML/etc. always tend to have limitations and issues. That said, if we find another representation of models makes it easier to represent and save subcomponents (especially for use with GUI tools), then we will definitely create one. Right now I think getting the full system off the ground is the main priority.

6 Likes