Today myself, Andreas Fehlner, and Adam Pocock made a presentation to the ONNX steering committee to propose new working groups around ONNX support for Bayesian Models from various frameworks. The working groups are not approved yet but we would like to gauge community feedback from this type of work and also do people want more ONNX support for Bayesian Models and Inference from frameworks like Stan, PyMC, Turing.jl, Pyro, Numpyro, Tensorflow Probability, and many others. Please drop a comment if you like around what you want from ONNX in terms of support for these frameworks and how can ONNX be more useful in deployment of these models.
That’s a fantastic proposal.
ONNX is a great platform for deploying neural networks and it would be a great opportunity for Julia to be involved in right from the beginning.
Did you consider Executorch? Afaik, the ONNX runtime might allocate, and thus is not realtime safe. This is problematic for some applications (e.g. audio).
No I haven’t considered Executorch, it looks interesting though. I think why I’ve chosen ONNX is also the fact that the ecosystem is mature and agnostic of the model development. ORT does not make guarantees like that but maybe also because it’s dealing with a wide variety of frameworks as well so I’ll take what I can get as long as we can easily deploy the model. Everything else I’ll have to deal with as things come up. It looks like Executorch is just meant to work with Pytorch which is good but we’re trying to support multiple platforms in this space.
I understand the choice to go with the mainstream option, but realtime safety isn’t something you can retrofit. The only way to proceed is a ground-up re-write with those principles in mind. This leads to solutions like RTNeural, which works for realtime but only supports a couple of NN architectures. [1][2] This defeats the point of having an ecosystem (since you’ll have to re-write the inference anyway).
This brings us to a meta discussion. I’ve seen this before w.r.t. deployment and static compilation (at least before JuliaC.jl and PackageCompiler.jl). Julia is the perfect language for numerics… but it cannot be deployed as a standalone app. Other languages (e.g. C++) are better suited in that regard… but you’ll have to re-write everything yourself. Neither case gives you what you want, which is easy and deployable. The tension is similar, but not identical to the 1.5 Language Problem. We can break this off to another thread if appropriate.
If you think ONNX is the right choice that’s fine, but these topics should be considered before committing to a path.
Understood, I still think ONNX is the better option in terms of longer term support and the community is just bigger and this would make deployment of these models much easier for a wider variety of cloud providers and make reproducibility easier.
Thanks for the interesting pointer - both the paper and the video were new to me, and I appreciate you sharing them. For readers who may not be familiar with the ONNX community and ecosystem, it’s helpful to distinguish between ONNX, the open standard under the Linux Foundation, and ONNX Runtime, the Microsoft product. ONNX itself is only
a) the file format,
b) the operator/function specification, and
c) the computational graph definition.
This distinction matters when discussing realtime constraints, deployment, and broader ecosystem implications, since choosing ONNX doesn’t inherently lock you into any particular runtime or architecture. It’s true that ONNX currently supports only a limited set of operators. Adding new ones requires aligning on a shared definition and following the process for extending the specification (see: onnx.ai/onnx/repo-docs/AddNewOp.html).
For use cases with strict realtime or deployment requirements, you still need an inference engine that provides the corresponding guarantees. Even within ONNXRuntime, the capabilities vary significantly depending on the selected execution provider (onnxruntime.ai/docs/execution-providers/).