Working with ASTs

mohamed82008 · January 19, 2019, 10:34pm

Well this is the idea yes. I guess we can call this a “module barrier”. Not sure what kind of troubles you may run into though, nothing comes to mind, but I am far from an expert.

Edit: it’s kind of surprising this even works to be honest!

cscherrer · January 20, 2019, 12:01am

Looks like we’re not out of the woods yet. If I do this…

fpre = @cleaneval $(logdensity(model))
f(par) = Base.invokelatest(fpre,par,data)

it complains that it doesn’t know what Normal is. Come to think of it, this will come up in lots of other places, and require either inlining or copying everything into the new module.

Doesn’t all of this seem a bit silly? Why can’t we express the concept “please evaluate this in the current scope”?

cscherrer · January 20, 2019, 3:06am

If I understood @ChrisRackauckas right, something like this may work:

struct Model{T} <: AbstractModel
    args :: Vector{Symbol}
    body :: Expr
    meta :: Dict{Symbol, Any}

    function Model(args::Vector{Symbol}, body::Expr, meta::Dict{Symbol, Any})
        T = gensym()
        new{T}(args, body, meta)
    end

    function Model(args::Vector{Symbol}, body::Expr)
        meta = Dict{Symbol, Any}()
        Model(args, body, meta)
    end

    Model(; args, body, meta) = Model(args, body, meta)
end

Inference methods can then be generated functions. Since these dispatch on type and the type parameter in gensym’d, this could get us around the eval issues

c42f · January 20, 2019, 6:03am

Yes! This is exactly the flip side of the problem with scope and eval: The user will expect their model to refer to the local scope in which @model was invoked, and will probably expect to refer to local / global variables within that scope. But calling eval at a later point messes that up.

If you really want to do source to source transformations after @model is declared and later evaluate them, there seems to be no easy way to completely avoid these problems. Generated functions could possibly help though they come with several tricky limitations and it would be hard to get the model AST in there as you have only the type of the model (not the value of the model, where you have stashed the model body). You might be able to do a nasty hack to get around this by using a global cache of ASTs, but I’m not sure that’s a good idea.

Here are some ways to mitigate the surprises you will get from using eval inside your model manipulation functions:

Capture the user’s module scope with the implicit __module__ argument to @model and use that as the scope for later evaluating code. This will allow them to pull in custom types in their module with a normal using statement and use them in their models.
Avoid evaling assignment statements in the user’s module (or in Soss itself); only eval expressions and capture the returned value for your use. Exemptions might be made for extending given generic functions, just think about it

cscherrer · January 20, 2019, 6:23am

Yep, I was just bumping into this difficulty. It seemed like it was going to work out ok, but then I started getting this error:

ERROR: generated function body is not pure. this likely means it contains a closure or comprehension.

There’s no documentation I can find beyond this of what is meant by “pure”, and there are several examples involving external state. I also tried a simple generated function for random number generation and it seemed to work fine. Think I’ll abandon that approach until it’s better documented.

Yes, these are some great points! I agree your first point is a better context for evaluation. And for the second bullet, we could even wrap eval in something to check for assignments and disallow them. To be really elaborate we could have a “whitelist” of allowed operations, or allow the user to specify this in a config file or something. This may be over-engineering it at this point, but down the road it could be nice to keep it from becoming a vector for exploits

Elrod · January 20, 2019, 7:05am

Why do you have to use eval at all? What problems does that solve?

My suggestion would be to have your @model macro define a generated function.
That generated function can then assemble appropriate AST based on type information of input arguments. By making distributions/functions into arguments as well as parameters you have a lot of flexibility.
A generated function seems to fit the idea of a “model is an abstraction of a function”.

My approach to hygiene for macros has been escaping the output, and manually calling gensym for all the variables I define.

cscherrer · January 20, 2019, 3:29pm

There’s lots of discussion above, if more context is helpful. The goal is to represent a Model as a first-class value that can be passed around, inspected, manipulated, etc. For this to work it needs to be close to the AST level; lowered code is too awkward to manipulate.

The trouble with this approach is it requires everything to happen at once; transformations would have to be determined a priori, or the entire pipeline would have to be run in order to allow inspecting a given step in the chain.

Generated functions are also very constrained in what they can do, and this is not well documented.

Escape the whole thing and interpolate a few gensymd values as needed, right? I’ve not done this exact thing, but it’s in a similar spirit. Sometimes the most sensible solution is to use things in unintended ways

cscherrer · January 20, 2019, 3:32pm

@ChrisRackauckas suggested I post my generated function attempt. I’ve tried a few variations, but here’s one:

@generated function nuts(model)
    quote
        function(data,numSamples) 
            result = NUTS_result{}
            t = getTransform(model)
            f = logdensity(model)

            P = TransformedLogDensity(t,f)
            ∇P = ADgradient(:ForwardDiff,P)
            chain, tuning = NUTS_init_tune_mcmc(∇P, numSamples);
            samples = transform.(Ref(∇P.transformation), get_position.(chain));
            NUTS_result(chain, t, samples, tuning)
        end
    end
end

In this approach, a model is really a Model{T} where the T is gensymd at construction time. Anything I do still throws an “impure” error.

One thing that’s not at all clear to me is when value should or shouldn’t be interpolated in generated functions. Thought I’d try to fix the bug first, but I fought with this long enough last night I’m leaning heavily toward falling back on eval.

Elrod · January 20, 2019, 4:07pm

Did you try replacing this line with a for loop?
Broadcasting and generated functions don’t mix (you get the “impure” error).

I find it east to manually control the hygiene. The less mysterious things going on, the easier it is to understand what you’re doing.
I remember running into cases where the higeine pass would get triggered again, and some variable references would get gensymed-twice, causing undefined variable errors.
It’s not like inserting manual gensyms are less clean than manual esc.

cscherrer · January 20, 2019, 4:12pm

Is that it? Broadcasting is maybe considered a kind of comprehension? Looks like “pure” is entirely the wrong word for the actual constraint. Thanks, I’ll try that

cscherrer · January 20, 2019, 5:44pm

Exactly! I’m trying to make this do what I want while avoiding mystery. It’s currently pretty clean, but the eval stuff is less than ideal.

I tried unrolling the broadcast, and it has stopped complaining about purity. Now I’m back to another issue that has come up again and again.

I need to be able to expand logdensity(m) for some model m. If I do this in a generated function, I only have access to the type of m. The system will let me do $(logdensity(m)) and get confused about the type thing, or logdensity($m), which returns the AST but is unable to evaluate it. What I really want is $(logdensity($m)), but that’s not allowed.

So the current crazy idea is to not just gensym the model type parameter, but to encode all of the model information in the type as well. Here’s a fun example of the idea abstracted away from the current context:

struct Test{T}
    a
    b
    function Test(a,b)
        v = Val((a=a,b=b))
        new{typeof(v)}(a,b)
    end
end

Unfortunately, value types seem to have some constraints, and I’m not sure if this is intentional or a Julia bug.

julia> Val((2,42))
Val{(2, 42)}()

julia> Val(("ok","then"))
ERROR: TypeError: in Type, in parameter, expected Type, got Tuple{String,String}
Stacktrace:
 [1] Val(::Tuple{String,String}) at ./essentials.jl:683
 [2] top-level scope at none:0

c42f · January 21, 2019, 3:58am

Haha, here’s one way to encode the AST of a model in its type. To be clear, I bring this up entirely for amusement value, because I think it’s a hilarious hack rather than a good idea. Please don’t actually use this code

using Serialization, Base64

struct Model{EncAst}
end

macro model(ex)
    encast = Symbol(base64encode(sprint(serialize, ex)))
    quote
       Model{$(QuoteNode(encast))}()
    end
end

getast(m::Type{Model{EncAst}}) where {EncAst} = deserialize(IOBuffer(base64decode(string(EncAst))))

# Some generated function which has access to the ast inside the *generator*
@generated function exec_model(m::Model)
    getast(m)
end

Thus:

julia> exec_model(@model begin
           a = [1,2,3]
           a .+ 1
       end)
3-element Array{Int64,1}:
 2
 3
 4

I can’t see how generated functions help at all with your scoping issues - the AST they produce is evaluated in the module of the function generator code (ie, inside Soss itself). In this case I think they just make everything more complicated and don’t solve any real issue.

c42f · January 21, 2019, 4:20am

This generator appears to make no sense at all because the returned AST doesn’t depend on the type of m

The idea of a generated function is to transform types into syntax. They are like runtime macros in some ways, but instead of getting an AST as input and producing an AST as output, they take the runtime types of the input arguments and turn that into AST as an output, which is then compiled by the compiler, with the result of that compilation being the function which is actually executed. Yes, this can be very confusing!

Generated functions are useful because they allow one to implement certain high-level optimizations (eg, loop structure, loop unrolling) where the compiler is unable to do so. For example, in StaticArrays we use it for a lot of loop unrolling, knowing the (type parameterized) static sizes of the statically sized arrays. So you should think of generated functions as an optimization tool, not a tool for implementing novel semantics. As the compiler improves we generally need less @generated functions, and this is a good thing.

c42f · January 21, 2019, 6:45am

@cscherrer Unfortunately this discussion has become very long and unwieldy, so sorry if I’ve missed answering some of your questions of questioning some of your answers

The thing I keep coming back to is that the exact semantics of the code inside @model seems unclear. The code itself is not a function (what does it return?), and yet it is largely imperative. The way you use it seems to be to select bits and pieces of the code to generate various functions of interest. For example, to generate a logdensity. Doing this for arbitrary model code with arbitrary data dependencies and also generating efficient code is super hard. So there will be restrictions. No loops, I guess.

I’ve also been thinking about the larger backdrop of why I’m interested in this. Frequently in practical programming problems there are adjustable parameters which might be learned from data; these might be many layers deep within a program. If only we could introduce such parameters in an arbitrary function with x ~ SomePrior(), and have a mode in which they can be optimized. I know this is dreaming right now, but there’s so many tantalizing hints that it might be within our grasp in the future… the problem has flavors of mathematical optimization, stochastic programming, reinforcement learning, differentiable programming… many things where julia packages are pushing the state of the art.

mohamed82008 · January 21, 2019, 8:24am

Seems possible with Cassette’s contextual dispatch, some macro magic and Turing! Think of all the parameters and the logpdf accumulator as part of the context. The variable can be uniquely identified by its symbol, the module, directory, file and line, etc, which can all be expanded by the macro at parse time. Then all of these can be passed to some innocent looking function that’s then interjected by Cassette’s overdub to update the parameter values and logpdf in the Turing context. Of course, there may be a lot of quirks along the way, but the idea seems promising.

c42f · January 21, 2019, 10:41am

Seems possible with Cassette’s contextual dispatch, some macro magic and Turing

Yep, that’s where I was going with this (see further up the thread )

cscherrer · January 21, 2019, 4:45pm

Sounds like a good rule of thumb

No worries! You been really helpful and patient with my ill-posed problem

Right, pinning down the semantics is part of the difficulty. So far I’ve been thinking of what I want to express, trying to find a ways to write it that’s easy to reason about, and then implementing what’s needed.

In addition to the constraints I described above (models should be first class, etc) there are a few things that will be important.

There will need to be a way to represent the kind of data expected. Is it all at once, or one observation at a time? Specifically, for many models there are “global” and “local” (per-observation) parameters. This is an important subclass of models that are suited to stochastic variational inference (SVI) for large datasets.

In the SVI context, the observations are usually conditionally independent. But maybe there is a static dependence structure. For example, maybe we have time-dependent data we’d like to represent in terms of a Markov chain. We should be able to provide init and next distributions so that x[0] ~ init() and x[t] ~ next(x[t-1]).

Or maybe we’re observing per-county data, and the are dependencies based on county adjacency. Observations in the context could be all at once, or one county at a time, or some unit within a county.

More ambitiously, these dependencies may not be known statically, but may be given as data. Things like url hyperlinks, social networks, etc.

I think the code is an abstract representation of a distribution over named tuples, where each value of the tuple could have arbitrary type.

So far I’ve just been trying to write models in SSA style, and this hasna’t been too bad. For general mutation, we’d probably need a compiler hint (“Soss hint”?) and a way to encapsulate the mutation.

Not dreaming so much. Frank Wood had a paper a few years ago where “parameters” were just nodes in an AST. He observed some data and then inferred what code might have produced it. I think Turing uses many of his algorithms, I’d be surprised if they couldn’t reproduce the result.

There were also some papers from Google Research a while back. One observed sorted data and learned an algorithm for sorting it. And there was another having to do with differential programming that optimized memory layout of some data structure.

These things are annoyingly hard to search the internet for; hard to distinguish between learning a thing and learning how a computer can learn a thing.

cscherrer · January 21, 2019, 5:51pm

@c42f Thinking about this some more… Are Distributions imperative or declarative? I’d say most are written declaratively, but have imperative methods defined on them. logpdf has a pure interface but it implemented imperatively. rand is fundamentally stateful, and could be in terms of a complex simulation.

This makes me wonder about the potential expressiveness of a fully-declarative Model interface. Maybe any imperative aspects can be pushed into a Distribution, giving a simlper interface to reason about.

Also, similar to the tradeoff of inlining data vs passing it at execution time, a similar thing could be done with Models. Say we have a “model-in-a-model” as we’ve talked about previously (let me know if this is too vague). In some cases we want the larger model to be able to pick apart the components of the submodel, reason about dependencies, etc. But in other cases, it may be more useful to build the logpdf or rand code for the submodel, and then to call that similarly to how we’d call a distribution.

I’m sure there are plenty of details to work out, but it seems this could be a more principled way of organizing these concepts.

A Distribution has methods logpdf, rand, etc
A Model may need additional information (or not) and then can be compiled into a Distribution. There will often be multiple ways of doing this compilation (it’s a form of inference), with corresponding tradeoffs.

What do you think?

Tamas_Papp · January 21, 2019, 5:58pm

Note that having a rand as a requirement for a Distribution and making models Distributions severely limits their usefulness, as algorithms to draw from generic distributions are either very costly, asymptotic, or approximate.

cscherrer · January 21, 2019, 6:06pm

I agree. But the idea isn’t “a Model is a kind of Distribution” but rather “a Distribution is a kind of Model” (namely one that has a rand, and usually a logpdf). An inference algorithm is then something that turns a Model into a Distribution. For example, DynamicHMC essentially provides a rand

Topic		Replies	Views
Compiler-like analysis in Julia General Usage question , metaprogramming	17	1477	December 31, 2018
New Soss interface - request for feedback Probabilistic Programming	8	2395	November 6, 2021
Probabilistic programming with source transformations Statistics announcement	66	7420	July 2, 2018
Metaprogramming with types Probabilistic Programming	14	1361	August 9, 2019
PPL without observations as function parameters Probabilistic Programming proposal	14	708	August 23, 2019

Working with ASTs

Related topics