A lot of people ask, what does JuliaHub do and what does this mean for open source? We took the time to detail how JuliaHub is a commercial entity that produces products like Dyad, but importantly how this connects to and strengthens the open source community. I think this deeper discussion of open source sustainability models is a deep topic that we need to discuss from time to time, so I hope you enjoy!
I watched the OSA Community event video a week ago (and skimmed this article now). The point about standards/certification made a lot of sense, and the part about managing maintanence in an academic setting - with student maintainers (hopefully) graduating and moving away - resonated a lot having seen the difficulties of that firsthand.
(btw, small typo: the first header is “Introduct” instead of “Introduction”)
One fundamental thesis here is that the FOSS parts make up the building blocks for the commercial side of things, and so it makes commercial sense to continue work on them and improve them.
This employment model creates a virtuous cycle: pharmaceutical companies pay for specialized domain expertise and regulatory-compliant tools, which funds the continued development of the open source foundations that enable even better tools in the future.
Which makes sense generally, but foundations vs specialized domain expertise often isn’t a black and white distinction. Taking the points under “Addressing Real Industrial Requirements” as a rough guide for the scope of Dyad
- “Strict code generation requirements meeting [aerospace] standards” and “Drag-and-drop model development for engineers” fall clearly in the commercial domain-specialized side
- “Validated library of engineering models” is slightly further in the spectrum but still largely in the specialized side
- “Accessible SciML capabilities” and “Handling complex real-world scenarios” are much less clear cut, and I can see there being a lot of ambiguity about whether certain functionality and code belong in the FOSS side of things or on the commercial end.
So how do you see JuliaHub resolving this kind of ambiguity moving forward - are there (or will there be) any internal processes or guidelines to decide what is in scope for Dyad vs what goes into the FOSS ecosystem? The argument that Dyad benefits from the open source core makes sense, but beyond that core, the incentive is for ease-of-use developments (“Accessible SciML”) and rich complex features (“Handling complex real-world scenarios”) to go into Dyad instead of the FOSS side of things. How do you draw the cut-off line so that Dyad remains sustainable without sacrificing the future ease-of-use improvements and richness of the FOSS ecosystem?
There is definitely some ambiguity in the boundaries. There are a few tools we use to address that. One is to make things in the ambiguous boundary be source available and free for academics. So if it’s not clear it can/should be open sourced, if it’s the kind of thing that makes new deals possible or is ultimately something that is more of an evolving science, we can be a little bit more conservative at first while still being a bit public about the methods, and then that can shift more open after some review. This has happened on multiple occasions.
But secondly, usually what happens is a strengthened understanding of where the boundaries are. For example, with Pumas it was really found that anything valuable in the space on nonlinear mixed effects models had to do with its ability to be used with the NMTRAN file format, the standard format that all clinical trial data. So while for example someone could use DifferentialEquations.jl directly and then Optimization.jl / Turing.jl to define some NLME fitting routines, the reality was that without the interface to NMTRAN it would be difficult for anyone in pharma to actually get any work done. So that made it more clear that we could for example work on things like global sensitivity analysis in the open, and yes a pharmacotrician could write the ODEs and callbacks manually and stick it into GlobalSensitivity.jl, it really wouldn’t make sense time-wise to do that on any large scale basis so you’d want to use the Pumas interface for any real work. That created a clear boundary.
In the space of Dyad, we believe a lot of that will come from the model libraries. Acausal model libraries allow for building very complex systems, these realistic HVAC systems, buildings, batteries, etc. As the model library system builds out, there becomes a point where you can say “use 20 lines of code in Dyad to get a realistic car, and then apply this SciML method on it” vs “construct a semi-realistic 100,000 ODE car model by building out the battery equations, the drive train, etc., and now do the open source SciML on it” and there comes a point where it’s not really a realistic comparison for anyone who has time constraints (i.e. not PhD students ). That then makes it possible to say, yeah we can develop some of these methods in the open so PhD students can try things, publish modifications, etc. but it won’t impact whether industrial users will just use the open source because its simply infeasible to leave behind all of the modeling infrastructure and rebuild/maintain every model from scratch.
This is generally why the numerics and SciML has stayed open, that plus it being the central focus of the MIT grants so the development of those parts has generally been more MIT led while the modeling and the industrial interfaces / requirements then is more of the company focus.