I am following your project with interest, but I have to admit that the mental model that works best for me is that models are (log) density functions that map a bunch of numbers to a single number.
I recognize that there are benefits to the DAG representation (such as getting eg predicted quantities easily), but at the same time I find DAGs constraining and difficult to optimize, and the benefits rather trivial compared to the other programming costs of PPC.
Also, I think it is a misconception that DynamicHMC (or NUTS, or MCMC in general) provides a rand. Perhaps if you run it for infinite time, but the key aspect of rand for me is getting IID samples in O(1) time.
That’s great when you can get it, and it’s my usual focus as well. But there are plenty of other areas people are working in. Likelihood-free methods like ABC, particle filter algorithms like Turing uses, etc. Also, online algorithms like Kalman filtering have to maintain and update state, very different than function and gradient evaluation.
I’m not married to DAGs, but being able to get at one can be helpful for some algorithms. There are also tricks you can play to get at the Markov blanket, like expanding the functional form of the log-likelihood and checking variable co-occurrence within terms. Right now the biggest benefit I see of the dependency graph stuff is to understand the sparsity structure of the Hessian.
Once the data are fixed, everything is O(1)
Seriously though, I see what you’re getting at, and I tend to agree. I would never advise “take this strongly-autocorrelated sampler and pretend it’s iid”. But that’s important to keep in mind for any sampling method.
Still, sometimes we need a sample as part of an algorithm. For example, each iteration of SVI samples from a given variational distribution. Maybe this is a proper Distribution, but it could also be a Model. It’s just a useful abstraction. This is exactly the approach Uber’s Pyro language takes.
Anyway, to be really pedantic about it, there’s no such thing as iid anyway. Any PRNG has state, and the implications of that always require care for more complex or high-dimensional sampling procedures
Thanks, I hadn’t seen that one. I don’t know Zenna well, but I really enjoyed his ideas that I’ve seen - think it was at a poster session or two. I’ll check it out
This sounds fairly sensible to me. We didn’t resolve your original question but at this stage it might be best to try prototyping various of these compilation steps in something like your existing AST-based style. Then there will be a bunch of examples of the required source transformations and the semantics of the stuff inside @model might be clear by example.
I do think an AST based approach will run into various eval scoping-related composability issues as soon as you start using Soss from other modules and with custom Distributions. As discussed up thread, there are some ways to mitigate this by carefully choosing the context for the eval.