In this thread @cscherrer was looking for examples of where composable probabilistic models would be an advantage:
The first case I can see is to provide a library of model components that can be composed. This would be particularly useful for models with complex dependence structures, e.g. spatiotemporal models. It would be interesting to be able to swap out say an AR1 process for a random walk with a sum-to-zero constraint.
The second case is domain-specific. I work in fisheries, and our largest, most complicated models are for stock assessments. The goal of a stock assessment is to estimate the abundance of a fish population so that it can be managed appropriately. Stock assessment models link a population dynamics model to different observation types including estimates of relative abundance, age and/or size composition, fishery catches, etc. Depending on how much/which types of data are available, the population dynamics model may be based on biomass dynamics (e.g. density dependence through something like a logistic map/Beverton-Holt model) or it may be age or size structured, tracking cohorts through the years. Environmental conditions may or may not be included. Fish may migrate between regions, sometimes based on age. Observations are primarily from fishing (including non-commercial surveys), but this means that they are noisy and can be affected by things like the tides, moon phase, and vessel size. A lot of research effort goes into determining which parameters can/should be estimated to get reliable abundance estimates. I can imagine a library that provides a DSL for specifying:
- a population dynamics model,
- which parameters are fixed vs. estimated, and
- the statistical model for estimated parameters and observations.
Besides providing a convenient way to specify models, a composable set of statistical models would allow for careful testing of code, which would be nice since people’s livelihoods depend on these models.