Working with ASTs

I would do something like this: a model, which is basically the DAG, is represented as the edges of the DAG in some flat structure (eg a tuple). Let’s call an edge DistAs(lhs, distribution). where distribution can refer to other variables (eg from a lookup table).

Then

function generic_model_for_μ(μ_distribution)
    (DistAs(:μ, μ_distribution), DistAs(:x, Normal(:μ, 1))
end

In the end, the “compiler” takes this tuple, verifies that the DAG is complete, and transforms it to a composite type that has the bells & whistles.

1 Like

There are some generic modeling and symbolic and ast manipulation libs that might help.

@ChrisRackauckas 's https://github.com/JuliaDiffEq/ModelingToolkit.j

It uses this GitHub - chakravala/Reduce.jl: Symbolic parser generator for Julia language expressions using REDUCE algebra term rewriter .

There’s also GitHub - chakravala/SyntaxTree.jl: Toolset for modifying Julia AST and characteristic values

Also GitHub - HarrisonGrodin/Rewrite.jl: An efficient symbolic term rewriting engine by @HarrisonGrodin

4 Likes

Also GitHub - JuliaDebug/ASTInterpreter2.jl: Re-write of ASTInterpreter for julia 0.6+

Those stated goals seem doable to me, the trick is in choosing a lowering for @model which expresses a rich enough structure. Tamas’ suggestion could work for simple declarative models, though I’m not sure whether your @model is meant to be a purely declarative mathematical definition, or whether it has imperative sequential structure like normal julia code?

Is the ideal that models are represented by a normal imperative julia function which samples from the model? The task of the framework then being to reinterpret this generative model in ways which allow it to perform inference on the parameters? I did try making this kind of thing work and it seems that, after all, contextual dispatch might be just the right thing. But I got stuck with not really understanding the domain specific stuff, especially the various functionality which the model must supply.

Reading the Soss code for nuts, it seems sufficient in that particular case to be able to extract a function which returns the domain of the parameters (ie, getTransform) as a named tuple and a function to compute the logdensity. But I assume other samplers require other (arbitrary?) transformations of the model?

ModelingToolkit looks rather promising as one possible approach.

1 Like

This might be useful (or might not):

Creating domain-specific languages in Julia using macros

I’ll check these out, thank you!

I really like @Tamas_Papp’s approach for a restricted-enough class of models. Even for declarative models, a wrinkle that causes some potential issues is composability. A Model generalizes a Distribution, so in particular we should be able to pass in one Model as an argument to another, or inline on inside another, etc. In the current approach (which I like so far) a Model is a Distribution over NamedTuples. The wrinkle is the question of whether DAG “dependency” should be in terms of one of these “inner” models or its components (I’m leaning toward the latter).

Eventually, I think it makes sense to allow arbitrary Julia code, with declarative models as a special case. But this is a good place to start.

Eventually, yes. This is more general, so I expect we’ll have less information to pull out, and fewer transformations available. The most important aspect in this case is to still allow composability, and to provide a nice way to pass to Turing.jl (this is their specialty). There’s some potential for semantics mismatch between this and Turing, but I’m putting off worrying too much about that for now :slight_smile:

Nice! I’m not sure what you mean by “contextual dispatch”, maybe representing context as a type and using multiple dispatch in the usual sense? Do you have a simple example you could share?

Yes! NUTS (or technically “dynamic HMC with dual averaging etc”) has some strict limitations: parameters must be fixed-dimensionality and continuous. It also doesn’t scale, so people do lots of approximations (“variational inference”). And there’s a ton of other things that are useful :slight_smile:

Yes! I’ve read through this before, but it’s been a while. Great stuff, good to have another go at it. Thanks @dpsanders!

1 Like

I’ve been getting confused about this expression

Is this really a model? You can’t sample from it. (Is μ real? Is it even a scalar?) It seems more like a model construction function. The same could be said for any model with free parameters, I suppose.

This is exactly what I mean. And of course, I’m referring obliquely to Cassette.jl: what I’m wondering is whether contextual dispatch would allow programs with stochastic variables to be interpreted in some of the different ways required for parameter inference vs sampling. For non-composable models you don’t need Cassette to do the contextual dispatch part though — you’ve got the @model macro to do the required lowering instead and that would be fine to figure out whether this makes any sense or not.

The reason I’m thinking about Cassette here is that AD has several similar features: there is a DAG of dependencies between variables which is relevant to the computation; you’d like to compose functions and have differentials flow through the whole chain; something being a differential is orthogonal to its type in the standard interpretation (cf. the mess with Dual{Complex} <: Real); epsilon confusion seems similar to confusing stochastic variables from different contexts, etc etc.

Yep, that’s the idea. I expect this kind of construction would usually be turned into a “proper” model (with only numerical free variables) before inference in most cases, but I’m not sure that’s the case. Maybe in some cases it could be given a type with some other structure? Still exploring the design space for some things like this. BTW, a lot of these ideas are from my little Passage workshop paper from a few years ago. This was limited to inlining data, in addition to some other restrictions. But a lot of the combinator ideas were there.

I’d have to dig in a bit more, but I think this may be similar to what the Turing folks are doing (maybe @mohamed82008 or @trappmartin can clarify?). Not that that would mean there’s anything wrong with the approach, but they’re probably way ahead on this front, and calling to Turing is probably sensible when the abstractions line up :slight_smile:

I am not sure about this. I think it scales better than any alternative — currently I am estimating a model with 100k variables using DynamicHMC (which is prompting a rewrite, but not because of the algorithm, just to eliminate some allocations).

1 Like

That’s an impressively large number of variables :slight_smile: “Does it scale” is a multidimensional question isn’t it? Precision and bias of parameter estimates for a given compute/memory budget.

Presumably it’s the usual story, monte carlo being very general and quick to get unbiased approximate parameter estimates in absurdly high dimensions… but also very slow if you’re looking for precision. Versus some other methods which compute highly precise but biased estimates relatively fast?

This is off-topic here, but my experience is that if you get good mixing and avoid the usual pathologies, you get a very precise posterior (in the sense that expected values of functions are reasonably accurate). The trick is to get good mixing :wink: This may require iteratively re-thinking your parametrization multiple times.

But yes, variational inference and eg INLA have complementary advantages, but I have no first-hand experience of how they scale to many variables.

The current weak point of gradient-based methods in Julia is AD: ReverseDiff works but becomes slow for large models and you have to use tricks, Zygote and Capstan are promising but WIP. ATM for large models I am coding derivatives manually.

2 Likes

@cscherrer Turing doesn’t do contextual dispatch per se, as that seems like overkill for Turing’s use cases so far and Turing predates Cassette. It could be possible to use Cassette though once production-ready to improve certain aspects in Turing. Turing however takes a very concrete approach to models as opposed to Soss’s somewhat abstract approach. This means that the structure of the generated code is really important to be able to flexibly support different inference algorithms, distributions, missing data, mixed samplers including particle and Hamiltonian samplers etc, all with a single (or possibly chained) macro transformation. Any after the fact transformations would only be possible with the help of Cassette. But so far most use cases don’t seem to require that.

3 Likes

Wow! I’d love to hear more some time about this - easily the highest-dimensional parameter space I’ve heard of with this approach.

I guess I was thinking more of scaling in the other direction - millions or billions of observations. MCMC gives the best possible precision, but (in the typical absence of sufficient statistics) requires looking at all of the data for each step. Stochastic variational inference seems to be just the thing in this case. David Blei has some great talks going into this - LDA for text analysis of large corpora for example. The approach in Uber’s Pyro is a really interesting probprog approach for this: The actual “model”, plus a “guide” that’s used as a form for the parameter distributions.

It’s really funny, in a Bayesian context these roles actually reverse. The goal isn’t a “point estimate” (single parameter value to serve as the solution) but a sample from the poseterior. MCMC is then exact but slow, while approximations like variational inference are much faster (since they take the form of an optimization) at the sacrifice of some fidelity.

This is really helpful, thank you. Concrete vs abstract is one way to look at it, but when it comes to inference time, anything need to be concrete :wink:

I guess in Turing a model is something that’s ready to be passed to an inference routine, whereas a model can (but doesn’t have to be) more abstract than that in Soss. And a lot of the point of Soss is to easily jump from one model to another. Maybe loosely we could think of this as a preprocessor for Turing.

2 Likes

I am not sure about this — if you read the Stan forums, you will see examples of people estimating models with a high number of parameters. The key is really how “difficult” the posterior is, not the number of parameters (which are latent variables in my case — once you figure out the reparametrizations, you make it really efficient).

I would try to find summary statistics, or use a subset and then adjust by importance sampling. There are other techniques.

Yes, I’ve seen the “funnel” example with very few parameters that can still be really difficult. One of the things I’d like to explore with Soss is to what extent model combinators can help get around parameterization difficulties, maybe even eventually automating them to some extent. This could make Bayesian analysis easier for beginners to get into, and could hopefully be useful for experts as well.

Sure, but importance sampling still requires evaluation of the log-likelihood, which means stepping through all of the data. I guess you could do tricks like they use in Bayesian quadrature and use a Gaussian process in place of the exact log-likelihood, but then you’re into approximation land anyway :wink:

1 Like

@c42f we drifted a bit off the original topic, but I’d still like to better understand your suggested macro guidelines. It sounds like you’re suggesting only macros should be calling eval. Is that right?

One difficulty I’ve had with this is that it seems there must always be at least as many quotes as interpolations. I guess this is an execution safety measure of some sort, but it makes it tough to pass around an AST which is later evaluated.

Do you see a way to do this? Any MWE you could provide?

Well I don’t have a clear suggestion for lowering because I’m still having trouble grasping the scope of the design problem and the semantics of the code inside @model. Perhaps because those aren’t fully defined yet for Soss, or perhaps because I’m just not getting it.

Some things we have:

  1. (Concrete) models are distributions over named tuples (cool idea!)
  2. We have free parameters which allow us to specify parametric families of models
  3. Free parameters can be fixed to create a newly concrete model.
  4. We have stochastic variables which are either parameters or observations
  5. The distribution of a stochastic variable can depend on another stochastic variable.
  6. Stochastic variables are introduced and their priors declared using var ~ prior
  7. Data for observations is given by model(var=data)
  8. Models declare their data and free parameters explicitly; parameters are inferred from the syntax.
  9. Models can be composed by calling submodels in their definition

The rough goal:

  • Soss should be able to use julia’s type system and compiler to good effect rather than having its own type system and compiler.

Please correct or add to the list :slight_smile:

I’m still confused as to whether the stuff inside @model is meant to be something you can run as an imperative program (slightly modified by the macro), or whether you intend it to be an abstract declaration. Having written down the list above it seems declarative, but the way you write the examples in Soss/src/examples.jl looks largely imperative.

If it’s a more or less imperative structure, we can emit a set of functions at macro expansion time which can later be called to perform certain actions on the model. This seems to be basically what Turing does. If it’s a largely declarative structure, generating an expression tree might make more sense, a bit more like what you have now. But it also depends on what kind of transformations should be possible on the models, and that seems to be a bit open ended.

Ideally, you never call eval explicitly, except at top level. For example, your use of eval inside Soss/src/bijections.jl is a good example of how it should be used. Sometimes this ideal can be violated to good effect, but always keeping in mind the global module scope in which it will execute. So macros shouldn’t be calling eval either if they can help it! Instead they should return some AST which is then spliced into the surrounding code by the macro expander, and later eval’d when the compiler is ready.

I don’t understand this question, perhaps you could restate it :slight_smile:

2 Likes

Yeah, I think there’s a mix of ideas in flux (lower-case f) and sparse documentation

These are all in line with the current approach

These two are not quite right, and seem to have been a point of confusion for others as well - need better docs.

There’s no concept of a “concrete” model. A model is an abstraction of a function, and the free parameters are its arguments. The model(var=data) transformation inlines the corresponding argument, which may or may not be a good idea.

But I think common usage will be to keep at least a “data” argument as a free parameter in most cases. “Inference” typically turns this into a function of the data, and may add other arguments as well. In principle there could be other kinds of inference; the main idea is to provide building blocks to express lots of kinds of models and inference, and to make all of this as modular as possible :slight_smile:

This is a lot more concise than I’ve been able to make it, and is very nearly what I’m after. One minor point is that it’s not really about making sure to use Julia’s type system and compiler, but to avoid poorly duplicating parts of it. I think types will stay as Julia types, but compiler-like transformations will be a big part of Soss. But whenever there’s a chance to leverage existing stuff, we should.

Great point, that aspect is still uncertain. I’ve tried to make things declarative-ish without sacrificing flexibility. Avoid mutation when possible, try to keep ~s at top level, etc. Fully declarative would be great, but I’d like to better understand what expressiveness I’m sacrificing before committing to that.

Right. I think it would be pointless to re-implement Turing, and duplication between the two efforts should be minimized to the extent that’s possible.

Maybe the ideal would be something along the lines of Haskell’s ST monad. This encapsulates imperative code and allows it to play nicely in a declarative context. But ST as-is wouldn’t quite fit, because we need things like x ~ dist to be considered declarative. I’ve also fallen into a few too many Haskell type-theory rabbit holes, and I’m not good enough at it for that to be productive :slight_smile:

All of this makes sense, until we come upon the problem of carrying around expressions to be somehow turned into executable code at a later time.

Macros don’t seem to solve this problem. For example, say we have fExpr = :(f(x) = x^2). Is it possible to write a macro @foo without using eval so that @foo fExpr evalautes in local scope and defines the function? I haven’t been able to do it.

I’ve been working at the AST level, but wouldn’t this also be a problem with any lowering? You still have something like a CodeInfo object, right? Wouldn’t turning that into executable code still require eval? There must be deeper levels of “pure Julia” than I’ve seen to this point.

The best way I have so far to make this safe is to create an Empty module,

julia> module Empty end
Main.Empty

Then we can ask @eval to use this scope:

julia> x = 3
3

julia> @eval Empty x
ERROR: UndefVarError: x not defined
Stacktrace:
 [1] top-level scope
 [2] eval(::Module, ::Any) at ./boot.jl:319
 [3] top-level scope at none:0

julia> @eval Empty $x
3

This helps catch mistakes like leaving off $ and referring to the wrong variable. Since interpolation is whatever’s in scope, it should work anywhere. The Empty namespace can still be polluted if we evauate any assignments, so maybe we could check for that first or somehow clear the namespace after each step. Or maybe there’s a more principled way of doing all of this anyway :slight_smile:

1 Like

One way could be to define a complete module (with a gensymed name) and overwrite the module every time you call eval. So your model manipulation code which doesn’t need to be high performance can be done on the syntax tree or a typed version of that like what Tamas suggested above with DistAs. Then when you want to run inference, you generate the code for it in the module (which you can optionally write to a file for visual inspection), and then call the main function inside the module. This feels a bit like what Stan does but using modules in place of files. I have never done this before though, so not sure if I am missing on any gotchas here.

Edit: the “code” for inference could also be just as well a call to Turing.jl with your model expression spliced in.

1 Like

Something like this?
(EDIT: Changed $ex to $(Expr(:quote,ex)), to allow @eval-like interpolation

macro cleaneval(ex)
    M = gensym()

    @eval module $M end

    quote
        Core.eval($M, $(Expr(:quote,ex)))
    end
end

So far this seems to work, and is safer than any other eval-based approach I’ve seen. Still not sure if this is the “right” way to do it - this kind of thing must come up with core development in Julia, can believe this could be the first time :slight_smile: