Shared base library for PPL

Hong Ge (are you on Disourse?) recently posted this comment in the probprog channel on Slack:

…I think it is quite helpful to consider carefully how we, as a community, can work together to build a better platform for probabilistic programming and (Bayesian) inference. For example, within the Turing team, we made an effort to create a shared interface for MCMC sampling methods. It allowed us to decouple inference algorithms from the Turing DSL. I found this approach very helpful since it compartmentalises complexity between various components such as the DSL and HMC implementation – separating Turing’s HMC code into its package made maintaining and improving HMC algorithms much more comfortable. Extrapolating form these experiences, what might be interesting to consider as a community, is to co-develop a shared base library AbstractPPL for various approaches to probabilistic programming such as tracing and source rewriting. It would allow different PPLs to talk to each other, and share some codebase in some instances.

I think there’s a lot of potential in this idea, but it will take lots of discussion to work through.

The comment will scroll away soon, so I’m capturing it here so we don’t lose it.

2 Likes

To me the biggest challenge here is that the internal representation of a model is very different from one PPL to the next. In a sense, it’s the defining characteristic of a PPL.

How could something like this be scoped in order to be as broadly useful as possible, while minimizing constraints imposed on future development?

I’m kinda new to this, of course, but have formed an opionion through my metaprogramming work with Turing and IRTracker, as I have already insinuated in some other discussions.

As far as I see it, the lowest common denominator of all embedded PPLs consists of

  1. General deterministic Julia code: this can most conveniently be represented like normal Julia IR with SSA statements, branches, and blocks
  2. “Sampling statements” for stochastic code: tildes, or assumptions/observations in Turing parlance, which relate names or values to distributions or other models
  3. Variable names, which may be “complex”, like containing indexing, fields, link functions, etc.

So my idea was to just combine all that into an IR-like syntax. In essence, this works like a structural equation model in SSA form, extended with control flow.

1:
  goto 2 (1)
2 (%1):
  %2 = <z[%1]>
  %3 = <mu[%2]>
  <x[%1]> ~ Normal(%3)
  %4 = %1 < %N
  br 4 unless %4
  br 3
3:
  %5 = %1 + 1
 br 2 (%5)
4:
  <y> ~ MvNormal(<x>, %sigma)

in “pseudo-IRTools formatting”, for what in Turing might look like

x = zeros(N)
n = 1
while n <= N
    x[n] ~ Normal(mu[z[n]])
    n += 1
end
y ~ MvNormal(x, sigma)

The good thing is that “sampling” is already inherently immutable: you don’t overwrite x[n], logically – you define it. (In Gen, IIRC, that is already implemented in this way; in Turing, it is not.)

How this is then interpreted is left abstract. At least (almost) trivially, you can just sample from the prior by putting the variable names into a dictionary, and replacing the tildes with assignments to rand calls. (The complication is that one may have to infer shapes and “unify” variables, like the x[i] and x above. But there must be at least one general solution to that.)

A specific system can then define its own interpretation or even generate code out of this form. Everything is open to static or dynamic analysis. And an implemention might just accept a fragment of the general variant, e.g., free of stochastic control flow, or with restrictions the indexing of variable names.

Oh, and while I’m in the Turing team, these are just my personal views! No promises made in any direction.

2 Likes

Thanks @phg, this is really interesting. In the pseudo-IR, would x need to be declared somehow before its use?

@McCoy and @MikeInnes, you’ve both worked in the intersection of IR and PPL, any thoughts on this?

I discovered something today which is possibly a very powerful technique. Previously, I wasn’t sure if you could do this in Julia - but now I know you can through the help of George Matheos (if it’s not pretty rough on the compiler, tbd):

https://github.com/femtomc/Jaynes.jl/blob/43fb0c323be4bbd933dadbf9b862a84f5700d924/src/contexts/update.jl#L2-L43

As you know, I’ve been working on this experimental project called Jaynes which implements the generative function interface of Gen as closure dynamos (execution contexts). Here is one of the contexts, the one which is responsible for updating choices in a recorded version of a model call and then adjusting the log weights appropriately.

This is context is also a closure, but it’s a generated function closure (see dynamos in IRTools) - so you have access to the lowered code at call time, and you can choose to compile a specialized version of the original method. The core dynamo transform wraps itself around calls, so it recursively calls itself down the stack, and that’s how dynamos work.

But you can also do other things inside the IR (if you like) before you compile the new method body. One of the most powerful techniques of Gen is the ability to specialize the generative function interface methods to a restricted language (called the static DSL in Gen). The specialization is performed by generated functions when given a tuple (static_dsl_function, static_choicemap, arg_diffs).

By moving the generative function interface to dynamo closures, you actually have the potential to specialize any call with the same information if you can configure a specialized IR pass through type information. I previously thought you could not pass this sort of information into generated functions from the outside - but I found out that Gen accomplishes this in static specialization through the keys (which is type information!) of NamedTuple instances.

Now, the code I’ve shown above shows the dynamo update context - the @dynamo transform itself (when called) can be configured by the key type information which the context type parameters keep. So you can control an IR transformation at runtime (e.g. dynamically) - which means that you should be able to generalize the static DSL specialization to any program call, if the space of IR passes permits it.

This is long-winded - but that being said, the IRTools IR will make writing these passes complicated. I need to compute the Markov blanket of addresses given the constraint map which you want to use to update your (call) sample. If we constructed a probabilistic IR which facilitated the writing of these passes, it would likely make this sort of technique more feasible, and likely enable more complex passes.

On top of this, this sort of IR representation would also allow the synthesis of custom “address map” types for specific calls to be performed. Right now, that’s one of the main downsides to using the full dynamic language in Gen - the address map is just a hierarchical dictionary structure, without much specialization for performance. That’s not strictly required - because you should always be able to morph a model with runtime randomness (e.g. stochastic control flow) into a set of static fields with fixed type, as well as some fields with heap allocated structures. Pure dictionaries don’t present a very optimized representation.

One question for @phg - is it possible to construct this IR to be compatible with IRTools? Because, broadly, you might want the same metaprogramming tools to be available there? Or at least a way to transform between representations, so that you can move back to the normal compiler representation.

I’m not sure – ideally not, although there might not be a way around it in general, with a strongly typed interpretation. It’s part of the “unification” process I am still thinking about.

But if we go back to the example above: supposedly, N, the size of x, depends on the size of the input. So if we partially evaluate the IR on constant input, Mjolnir style, the loop vanishes and a series of x[1] = ...; ...; x[N] = ... remains. In that case, “shape” inference should be easy.

If, however, the shape does not become constant given the input, like in a nonparametric model, other means would be required. Something like what McCoy means with

if I understand him correctly.

You pretty much self-answerd that:

But certainly it would be very similar. I don’t think the IRTools representation lets you insert arbitrary values in all places, though, like Expr does – which would be needed to embed variable names and tildes. On the other hand, you’re of course right in that a lot of the static analysis facilities would have to be duplicated in an “extended rewrite”.

I imagine using this kind of system in a way that the PPL IR would then be lowered very easily to IRTools IR or CodeInfo, but with the variable names removed and the tildes converted to assignments plus a couple of calls. The other direction of conversion I guess would only be necessary to insert SSA statements, and that should be doable.

Hi @phg, another question on this…

The effect of ~ will change across different inference methods. In Soss, I’ve been making this substitution as part of the transition from PPL-specific representation to Julia code. If I’m understanding this right, the IR you propose delays this substitution until much later in the compilation pipeline. How do you see this affecting available compiler optimizations? My intuition is that it would limit this, but I’m not sure.

No, I don’t think I imagined the tildes to be preserved until later, but maybe that was not expressed clearly. As you say, I expect a specific PPL to take the IR, and either interpret it, or spit out normal Julia code by transforming the IR to whatever it needs. But it could also apply transformations internally, from the IR to itself, and those could of course keep the tildes.