Soss relating to the broader PPL ecosystem

Hello Julia PPL,

I’ve been thinking some more about how Soss might relate to other PPL work. Hope I might get your thoughts on this, and how it might benefit the community as a whole.

The big idea is that Soss is probabilistic glue.

Say we have a relatively generic model like

m = @model x, d1, d2
    z ~ d1(x)
    y ~ d2(x,z)
end

This acts like a parameterized family of distributions, similar to Normal, etc. And like Normal, specifying parameters produces a distribution, in this case a Soss JointDistribution. These are handy, because it’s very easy to rand the whole thing or build predictive distributions that remove ancestors of given variables (there’s really more to it than that, but that’s the idea).

Like a distribution, you can (not always, but often) call rand, logpdf, and some other things. In Soss we do this by passing responsibility to the components, and generating code to connect the pieces in the right way. We will be considering special cases where code can be rewritten, but we can always fall back on this per-component approach. At inference time, you can specify any variables you want and reason about the rest.

Typically, d1 and d2 would be Distributions. But they can really be anything where the required methods are available: Soss models, or even if principle Turing or Gen models.

It can go the other way as well; any PPL that requires a logpdf can call a Soss model. This could make it easy, for example, to wrap a Gen model in Turing, or vice-versa, by using Soss as the glue.

This opens up some interesting possibilities. For example, the design of Turing makes it difficult to connect to MLJ, but it seems this may not be a problem for Soss. So a solution could be to solve it for Soss, then connect Soss with Turing. I’d expect the benefits to be the same as those of any glue code.

I’m still feeling this out, but it What do you think?

5 Likes

One issue I think with this is that Turing doesn’t really directly support the Distributions API to the same extent as Soss, so it would be very difficult to have a Turing-within-Soss model. It would be moderately trivial I think to do it the other way for a couple of our samplers, though I don’t know how Soss would work in the case of parameter transforms.

Really? I wasn’t aware of this. I’ve had some trouble with Distributions, and Gen doesn’t seem to use it at all. This makes me think we need to either Build a separate library specializing in PPL needs, or determine what needs to change for Distributions to be a better fit.

What parts of this cause problems for you? For me, it’s mostly the way the types are set up. I’d love to have an alternative where distribution structs are relatively unconstrained (IMO constraints belong in methods, not structs), new methods are very easy to implement, and the types are set up in way that’s optimized for PPL.

Still, I don’t see this as a limitation. Inference in Soss works by calling primitives (logpdf, rand, xform, etc) on the components, though this can be specialized as needed.

For parameter transforms, do you mean like the ones needed for HMC? I’m using TransformVariables.jl because of the support for named tuples, but I have a function xform that takes a distribution and returns the appropriate transformation. I think Bijectors.jl has a similar approach.

I suppose by “Distributions API” I mostly mean the suite of rand, logpdf, and a couple others you might need. In Turing’s current iteration, the model typically needs a sampler to access a parameter, and so you can’t just call rand(model) or logpdf(model, some_parameters). My understanding was that Soss did that as kind of a first order thing.

Yeah, I think we need to have a unified interface on the model side, or at the very least an agreement on what you should be able to call on a model. I think rand and logpdf are the two that should definitely be included.