Compiler-like analysis in Julia

cscherrer · December 28, 2018, 8:20pm

Hello all,

For Soss.jl I need to do some static analysis and transformations on expressions. Here’s a simple example of a Model:

myModel = @model x begin
    μ ~ Normal(0, 1.0)
    x ~ Normal(μ, 1.0)
end

Think of this as specifying two dependent random variables μ and x, where x is also observed and passed in as an argument. The @model macro just makes this a little more convenient to write, and produces a Model(args,body). So in this case it produces

julia> myModel.args
1-element Array{Symbol,1}:
 :x

julia> myModel.body
quote
    μ ~ Normal(0, 1.0)
    x ~ Normal(μ, 1.0)
end

~ is undefined, and is just a placeholder. “Inference” in this approach transforms a Model into an Expr that can be evaluated and executed.

This is all well and good, but it gets trickier. Here models are built from distributions, but we should also be able to use models as building blocks for other models. Composability, and all that .

So we can write

nested1 = @model x begin
    m ~ @model z ~ Normal(0, 1)
    x ~ Normal(m.z, 1)
end

But that should give the same results as, for example,

nested2 = @model x begin
    M = @model z ~ Normal(0,1)
    m ~ M
    x ~ Normal(m.z, 1)
end

It’s important to be able to ask simple questions of a model, like “What are the names of assigned variables?” while taking this kind of nesting into account. MacroTools.jl gets us pretty far, and DataFlow.jl is promising but is not documented enough for me to understand how to apply it. Anyway, IIRC dataflow drops the variable names, which is counter to what I’m trying to do.

Any suggestions? Or any compiler-like libraries I should be looking into? Or maybe there’s a way to leverage Julia’s JIT compilation and interrupt it after a step or two?

Thanks!

datnamer · December 28, 2018, 8:50pm

Sounds like @jrevels 's GitHub - JuliaLabs/Cassette.jl: Overdub Your Julia Code can help here.

Edit: Also, how does your package compare to the goals/approach of turing.jl?

cscherrer · December 28, 2018, 9:21pm

That could work, but I had hoped to find something light-weight and do the analysis statically - I think Cassette requires actual execution. After posting this I was looking into code_lowered et al, maybe there’s enough there to solve the issue but I need to understand it better.

Big difference is that this is inference-agnostic. Writing an inference method means building the necessary code transformations. The benefit of this approach is that the intermediate representation can stay very lean while allowing any inference method you like. And there’s zero runtime overhead imposed - in principle you could build an inference method that produces custom LLVM, CUDA, whatever you like.

mohamed82008 · December 28, 2018, 10:03pm

Note that Turing models are inference-agnostic, in that you can pass any inference algorithm after the fact to sample from the model. The model itself is a Turing.Model (in Turing master) which holds the information necessary for sampling, and a runnable form of the “body” that you mention. Turing’s “compiler” design, which is really just a parser, expands the model definition below figuring out the parameters and data variables in the process, and defines the Turing.Model generator function:

julia> using Turing

julia> @model model_gen(x) = begin
           μ ~ Normal(0, 1.0)
           x ~ Normal(μ, 1.0)
       end
model_gen (generic function with 2 methods)

julia> model = model_gen(1.0); # generates a Turing.Model

julia> Turing.pvars(model)
(:μ,)

julia> Turing.dvars(model)
(:x,)

julia> model.data
(x = 1.0,)

To sample from this model using HMC, you do:

sample(model, HMC(1000, 0.1, 5))

cscherrer · December 28, 2018, 11:42pm

What do you mean by “any inference algorithm”? I see a few implemented in Turing, but both your model_gen and model seem to be opaque functions. Are you sure there’s no inference algorithm or optimization that would require hacking on Turing itself to implement?

A lot of PPLs start off with one particular inference method in mind, and have an intermediate representation oriented toward this. Alternative methods tend to require extending the IR, usually resulting in something very complex and/or with huge amounts of overhead.

This has bitten me before in the Haskell-based PPLs I’ve built, and I’ve seen it bite others as well. But Julia has this great metaprogramming support, so we can have the best of both worlds.

So a big goal with Soss was to stay very close to the original model description and provide some combinators for manipulating model source code. There’s no IR focused about log-posterior-density evaluation or continuations. With flexible source code generation, Soss doesn’t play favorites, and imposes zero runtime overhead. It’s not there yet, mostly because so far it’s just me and this isn’t my day job. But I’ve been convinced that code generation is the way to go - that was one of the big draws for starting with Julia.

I’m sure there are some great ideas in Turing, but getting to them seems to require really digging into the details of the implementation. On the surface it seems to be very Anglican- or WebPPL-like, but I’d love to learn more about where you’re going with things.

mohamed82008 · December 28, 2018, 11:49pm

There is an interface which you would have to define for your new inference algorithm. Take a look at importance sampling. These are the main functions that need to be defined to make (or wrap) a new inference algorithm. If you have any questions, feel free to ask on the turing slack channel.

trappmartin · December 29, 2018, 10:31am

I suppose the motivation of Turing and Soss are very similar in this respect. Turing provides only a very thin layer to ensure any inference algorithm can be used. Apart from that we do not favour HMC or particle Gibbs and implementing samplers that work with a Turing model is rather simple. For example we provide an interface for your DynamicHMC implementation.

Tamas_Papp · December 29, 2018, 11:06am

Regarding the original question (which is related to the tangent at DynamicHMC.jl#28), I would build a DSL from objects that are themselves compiler-friendly. Eg with a surface syntax like

struct Var{K}
    name::Symbol
end

Data(name) = Var{:data}(name)

Param(name) = Var{:param}(name)

struct DistributedAs{V <: Var, D}
    var::V
    distribution::D
end

≂(var, distribution) = DistributedAs(var, distribution) # I would not pirate ~

let z = Data(:z), μ = Param(:μ)
    Model(z ≂ Normal(μ, 0.2),
          μ ≂ Normal(0, 5))
end

where Normal etc has methods for Vars, and Model would contain the structure. This you can transform and work with in a type-stable way that the compiler likes; I would leave everything to the latter.

cscherrer · December 29, 2018, 3:19pm

I’m sorry to have misstated this. Looking at the example @mohamed82008 linked to, I get the impression a Turing “Model” is just a function with ~ relations replaced at inference time with assume and observe, which are specialized according to the type of inference algorithm. Is that right?

In most (non-Julia) settings, this would impose some redirection overhead and/or be difficult to specialize. It may be that Julia’s JIT compilation and multiple dispatch make this a non-issue.

Part of the confusion may be that I’ve been lumping more into “inference” than I probably should. Really it’s about model transformations, which may result in executable code, or maybe just more models. We should be able to compose models, and even compose inference algorithms.

As one example of such a transformation, consider centered vs non-centered parameterization. Say we have something like (in Soss syntax)

m = @model x begin
    μ ~ Normal(0,1)
    σ ~ HalfCauchy(3)
    x ~ Normal(μ, σ)
end

Then we should be able to use the (not yet implemented) uncenter combinator to arrive at

uncenter(m,suffix=:Raw) == @model x begin
    μ ~ Normal(0,1)
    σ ~ HalfCauchy(3)
    xRaw ~ Normal(0,1)
    x ~ Delta(σ*xRaw + μ)
end

Models have a meta field that can be used to store things like its entropy, dimensionality, or anything else that’s useful. Have you ever seen a beginner write a model and an expert “fix” it, arriving at something that performs much better, maybe at the cost of a more difficult-to-read implementation? In a sense, the expert is doing a compiler transformation. Maybe “fixing” it can be easier. And maybe some of it can eventually be automated

Not mine - DynamicHMC.jl is due to @Tamas_Papp.

Yes, this is exactly it! I had hoped that since Julia is already doing this kind of thing under the hood, it might be simpler to hook into that directly. I suppose I could roll my own, but I don’t want to reinvent the wheel or end up with an IR for all of Julia, and finding a sensible middle ground can be tricky.

I could do something separating “tilde code” from “base” Julia and only mucking with the former. But the base piece could still have a lot of important information like variable assignments.

Do you mean because it’s “bitwise not”? The overloading bugs me a bit, but most points I see are pro-piracy:

~ is standard notation
Alternatives like ≂ require unicode and are awkward to type
All existing methods of ~ are unary
Infix ~ parses appropriately:

julia> :(x ~ d) |> dump
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol ~
    2: Symbol x
    3: Symbol d

mohamed82008 · December 29, 2018, 5:20pm

Really it’s about model transformations, which may result in executable code, or maybe just more models.

In Turing, these transformations would need to be done at parse-time by chaining macros. A clever use of the Julia type system and if statements can give the right hints to the Julia compiler to eliminate most of the code generated by the “model generator” function, and “specialize” your model for the inputs.

We should be able to compose models, and even compose inference algorithms.

This is possible but not implemented yet in Turing. The idea is to have a Turing.Sampler which is the coupling of a model and a sampling algorithm, and then define the Distribution interface for it. This will make Turing models compose after some minor changes. At least this is the idea, it would be really nice to take your input on this if you would like to collaborate, perhaps offline or on slack.

Looking at the example @mohamed82008 linked to

Note that you need Turing#master to see the new Turing.Model from my example above.

It would be really nice to have a problem-oriented discussion. I am sure Turing and Soss can learn a lot from each others’ approaches and the outcome may be better versions of both

cscherrer · December 29, 2018, 6:01pm

Right, so this seems to be the biggest thing missing from that approach.

It had seemed to me it might eliminate all such code. Is there any it has trouble with? In a tight inner loop, any overhead at all is too much.

I think I’m coming off as more negative about Turing than I mean to be, and for that I’m sorry. It’s not really meant as a value judgment about Turing; the point of the work is that there’s a critical need (model composition and transformation) that’s not being met, and Julia seems to have the right tools for the job.

Outside of this, the closest to a solution I know of is Hakaru. But that takes an ambitious research-focused approach, and could take some time before being ready for real-world use. And “Haskell for data science” is a whole other can of worms.

Oh that helps, I didn’t realize there had been a significant refactoring.

Thank you, that would be great. If Turing is at or could get to “what runs is what you would write by hand” without any additional overhead, I’d be inclined to focus on using and developing that, perhaps shifting Soss to a front-end preprocessor for Turing.

mohamed82008 · December 29, 2018, 6:09pm

In theory, there shouldn’t be. In practice, we will need some carefully designed test cases to try to get them to work as desired. Let’s see how far we can take it.

Elrod · December 29, 2018, 10:58pm

Have you looked at MacroTools?

You can define a series of functions that perform transformations on expressions, to act like compiler passes.

jjulia> using MacroTools: @capture, postwalk, striplines

julia> function model_pass(expr)
           postwalk(expr) do x
               if @capture(x, y_ ~ @model dist__)
                   Y = gensym(y)
                   return quote
                       $Y = @model $(dist...)
                       $y ~ $Y
                   end
               else
                   return x
               end
           end        
       end
model_pass (generic function with 1 method)

julia> q = quote
           m ~ @model z ~ Normal(0, 1)
           x ~ Normal(m.z, 1)
       end;

julia> striplines(q) # striplines is just for pretty printing
quote
    m ~ #= REPL[47]:2 =# @model(z ~ Normal(0, 1))
    x ~ Normal(m.z, 1)
end

julia> model_pass(q) |> striplines
quote
    begin
        ##m#421 = #= REPL[40]:6 =# @model(z ~ Normal(0, 1))
        m ~ ##m#421
    end
    x ~ Normal(m.z, 1)
end

I have been working on something of this sort.

However, I’ve wanted to get very aggressive with my optimizations.
Eg, using the macrotools to replace calls to special functions with calls to SLEEF, using CpuId in a build.jl file to find the host computer’s vector width, and loop transformations using SIMD intrinsics, etc.
Additionally, I was planning on having the model (pre-)allocate one block of working data, and using a PtrMatrix type, to work with memory in this set. Should improve cache locality, and of course avoid any runtime allocations.

(I’d be more than happy to collaborate if anyone is interested in that sort of approach!)

I am also interested in supporting both gradients (for HMC, or quasi-Newton methods) and Hessians (higher order asymptotic approximations, such as as from here.
However, to that end, I was also working on banded memory layouts for Symmetric and triangular matrices.
That is often the most efficient way to access data in these matrices, eg when calculating Hessians.

cscherrer · December 29, 2018, 11:17pm

Yeah, most of Soss is written using this. I had thought about the approach you suggest, but it can quickly hit a wall:

m1 = @model x begin
    μ ~ @model z ~ Normal(0,1)
    x ~ Normal(μ,1)
end

is fine, but how about

m2 = @model x begin
    M = @model z ~ Normal(0,1)
    μ ~ M
    x ~ Normal(μ,1)
end

or

M = @model z ~ Normal(0,1)

m3 = @model x begin
    μ ~ M
    x ~ Normal(μ,1)
end

?

Really? Do you have a link? Have you looked at Soss.jl? It would be great to find common ground

Elrod · December 30, 2018, 1:08am

The pass I wrote transforms:

m1 = @model x begin
    μ ~ @model z ~ Normal(0,1)
    x ~ Normal(μ,1)
end

into

m2 = @model x begin
    M = @model z ~ Normal(0,1)
    μ ~ M
    x ~ Normal(μ,1)
end

(but with an autogenerated name instead of M)
so I was implicitly assuming that the latter is a more canonical form / the next step in a transformation process.

This is harder:

M = @model z ~ Normal(0,1)

m3 = @model x begin
    μ ~ M
    x ~ Normal(μ,1)
end

I really should look into cassette!
Should probably do that before, for example trying to resurrect/reimplement Sugar.jl; README example:

using Sugar

function controlflow_1(a, b)
    if a == 10
        x = if b == 22
            7
        else
            8
        end
        for i=1:100
            x += i
            x -= 77
            if i == 77
                continue
            elseif i == 99
                break
            end
        end
        return x
    else
        return 77
    end
end

Sugar.sugared(controlflow_1, (Int, Int), code_lowered)

yields

quote
    NewvarNode(:(_4))
    if _2 == 10
        if _3 == 22
            _7 = 7
        else
            _7 = 8
        end
        _4 = _7
        SSAValue(0) = (Main.colon)(1,100)
        _5 = (Base.start)(SSAValue(0))
        while !((Base.done)(SSAValue(0),_5))
            SSAValue(1) = (Base.next)(SSAValue(0),_5)
            _6 = (Core.getfield)(SSAValue(1),1)
            _5 = (Core.getfield)(SSAValue(1),2)
            _4 = _4 + _6
            _4 = _4 - 77
            if _6 == 77
                continue
            end
            if _6 == 99
                break
            end
        end
        return _4
    end
    return 77
end

The approach could give us the AST of other functions.
Alternatively, because in your example M was defined using the @model macro, you could make sure this macro provides information necessary.

I’ve been “watching” Soss on Github (but didn’t dive into the code base) since you first announced it here!

Soss is much further along than my efforts, which probably don’t deserve much more credit than a pipe dream at the moment. I put some code up here, but had a lot of time to rethink implementation details since the last update. I’d also use ChainRules instead of DiffRules, and probably drop LightGraphs. There are also no comments or documentation.
When I resume working on this, I’m likely to start fresh with most of, or at least refactor significantly.

Much of my work recently has been on building the lower level tools I plan to use (or on projects that will actually go into my dissertation).

Eg, vectorization efforts in matrix multiplciation and improved loop vectorization, eg

julia> using StaticArrays, BenchmarkTools, Random

julia> function fill_BPP!(BPP::AbstractMatrix{T}) where T
           @views randn!(BPP[:,1:3])
           @inbounds for i ∈ 1:size(BPP,1)
               S = ( @SMatrix randn(T,6,3) ) |> x -> x' * x
               BPP[i,4]  = zero(T)
               BPP[i,5]  = S[1,1]
               BPP[i,6]  = S[1,2]
               BPP[i,7]  = S[2,2]
               BPP[i,8]  = S[1,3]
               BPP[i,9]  = S[2,3]
               BPP[i,10] = S[3,3]
           end
           BPP
       end
fill_BPP! (generic function with 1 method)

julia> BPP = fill_BPP!(Matrix{Float32}(undef, 4096, 10));

julia> @inline function pdbacksolve(x1,x2,x3,S11,S12,S22,S13,S23,S33)
           Ui33 = inv(sqrt(S33))
           U13 = S13 * Ui33
           U23 = S23 * Ui33
           Ui22 = inv(sqrt(S22 - U23*U23))
           U12 = (S12 - U13*U23) * Ui22
       
           Ui33x3 = Ui33*x3
           
           Ui11 = inv(sqrt(S11 - U12*U12 - U13*U13))
           Ui12 = - U12 * Ui11 * Ui22
           Ui13x3 = - (U13 * Ui11 + U23 * Ui12) * Ui33x3
           Ui23x3 = - U23 * Ui22 * Ui33x3
       
           (
               Ui11*x1 + Ui12*x2 + Ui13x3,
               Ui22*x2 + Ui23x3,
               Ui33x3
           )
       end
pdbacksolve (generic function with 1 method)

julia> X = Matrix{Float32}(undef, size(BPP,1), 3);

julia> function process_big_prop_points_v1!(X::AbstractMatrix{T}, Data::AbstractMatrix{T}) where T
           @inbounds @simd ivdep for i ∈ 1:size(Data,1)
               X[i,:] .= pdbacksolve(
                   Data[i,1],Data[i,2],Data[i,3],
                   Data[i,5],Data[i,6],Data[i,7],Data[i,8],Data[i,9],Data[i,10]
               )
           end
       end
process_big_prop_points_v1! (generic function with 1 method)

julia> @benchmark process_big_prop_points_v1!($X, $BPP)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     51.442 μs (0.00% GC)
  median time:      53.201 μs (0.00% GC)
  mean time:        54.350 μs (0.00% GC)
  maximum time:     90.791 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> using SIMDPirates, SLEEFwrap

julia> @generated function process_big_prop_points!(X::AbstractMatrix{T}, Data::AbstractMatrix{T}) where T
           quote
               @vectorize $T for i ∈ 1:size(Data,1)
                   X[i,:] .= pdbacksolve(
                       Data[i,1],Data[i,2],Data[i,3],
                       Data[i,5],Data[i,6],Data[i,7],Data[i,8],Data[i,9],Data[i,10]
                   )
               end
           end
       end
process_big_prop_points! (generic function with 1 method)

julia> @benchmark process_big_prop_points!($X, $BPP)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.051 μs (0.00% GC)
  median time:      6.111 μs (0.00% GC)
  mean time:        6.308 μs (0.00% GC)
  maximum time:     13.125 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     5

julia> (@macroexpand @vectorize Float32 for i ∈ 1:size(Data,1)
                          X[i,:] .= pdbacksolve(
                              Data[i,1],Data[i,2],Data[i,3],
                              Data[i,5],Data[i,6],Data[i,7],Data[i,8],Data[i,9],Data[i,10]
                          )
                      end) |> striplines
quote
    ##N#419 = size(Data, 1)
    (Q, r) = (SLEEFwrap.VectorizationBase).size_loads(Data, 1, Val{16}())
    ##pData#426 = SLEEFwrap.vectorizable(Data)
    ##pX#420 = SLEEFwrap.vectorizable(X)
    begin
        for ##i#418 = 1:16:Q * 16
            ##iter#417 = ##i#418
            begin
                ####pData#426_i#427 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 0 * SLEEFwrap.stride_row(Data))
                ####pData#426_i#428 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 1 * SLEEFwrap.stride_row(Data))
                ####pData#426_i#429 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 2 * SLEEFwrap.stride_row(Data))
                ####pData#426_i#430 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 4 * SLEEFwrap.stride_row(Data))
                ####pData#426_i#431 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 5 * SLEEFwrap.stride_row(Data))
                ####pData#426_i#432 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 6 * SLEEFwrap.stride_row(Data))
                ####pData#426_i#433 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 7 * SLEEFwrap.stride_row(Data))
                ####pData#426_i#434 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 8 * SLEEFwrap.stride_row(Data))
                ####pData#426_i#435 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 9 * SLEEFwrap.stride_row(Data))
                begin
                    ##numiter#424 = SLEEFwrap.num_row_strides(X)
                    ##stride#425 = SLEEFwrap.stride_row(X)
                    ##B#421 = SLEEFwrap.extract_data.(pdbacksolve(####pData#426_i#427, ####pData#426_i#428, ####pData#426_i#429, ####pData#426_i#430, ####pData#426_i#431, ####pData#426_i#432, ####pData#426_i#433, ####pData#426_i#434, ####pData#426_i#435))
                    for ##j#423 = 0:SIMDPirates.vsub(##numiter#424, 1)
                        SIMDPirates.vstore(getindex(##B#421, SIMDPirates.vadd(1, ##j#423)), ##pX#420, SIMDPirates.vmuladd(##stride#425, ##j#423, ##iter#417))
                    end
                end
            end
        end
    end
    begin
        if r > 0
            mask = SIMDPirates.vless_or_equal(SIMDPirates.vsub((Core.VecElement{Int32}(1), Core.VecElement{Int32}(2), Core.VecElement{Int32}(3), Core.VecElement{Int32}(4), Core.VecElement{Int32}(5), Core.VecElement{Int32}(6), Core.VecElement{Int32}(7), Core.VecElement{Int32}(8), Core.VecElement{Int32}(9), Core.VecElement{Int32}(10), Core.VecElement{Int32}(11), Core.VecElement{Int32}(12), Core.VecElement{Int32}(13), Core.VecElement{Int32}(14), Core.VecElement{Int32}(15), Core.VecElement{Int32}(16)), unsafe_trunc(Int32, r)), zero(Int32))
            ##i#418 = (##N#419 - r) + 1
            ##iter#417 = ##i#418
            begin
                ####pData#426_i#427 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 0 * SLEEFwrap.stride_row(Data), mask)
                ####pData#426_i#428 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 1 * SLEEFwrap.stride_row(Data), mask)
                ####pData#426_i#429 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 2 * SLEEFwrap.stride_row(Data), mask)
                ####pData#426_i#430 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 4 * SLEEFwrap.stride_row(Data), mask)
                ####pData#426_i#431 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 5 * SLEEFwrap.stride_row(Data), mask)
                ####pData#426_i#432 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 6 * SLEEFwrap.stride_row(Data), mask)
                ####pData#426_i#433 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 7 * SLEEFwrap.stride_row(Data), mask)
                ####pData#426_i#434 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 8 * SLEEFwrap.stride_row(Data), mask)
                ####pData#426_i#435 = SIMDPirates.vload(SVec{16,Float32}, ##pData#426, ##iter#417 + 9 * SLEEFwrap.stride_row(Data), mask)
                begin
                    ##numiter#424 = SLEEFwrap.num_row_strides(X)
                    ##stride#425 = SLEEFwrap.stride_row(X)
                    ##B#421 = SLEEFwrap.extract_data.(pdbacksolve(####pData#426_i#427, ####pData#426_i#428, ####pData#426_i#429, ####pData#426_i#430, ####pData#426_i#431, ####pData#426_i#432, ####pData#426_i#433, ####pData#426_i#434, ####pData#426_i#435))
                    for ##j#423 = 0:SIMDPirates.vsub(##numiter#424, 1)
                        SIMDPirates.vstore(getindex(##B#421, SIMDPirates.vadd(1, ##j#423)), ##pX#420, SIMDPirates.vmuladd(##stride#425, ##j#423, ##iter#417), mask)
                    end
                end
            end
        end
    end
end

My goal would be for the software to be several times faster than the likes of Stan.

I’ll have to spend some time looking at the Soss code base.

cscherrer · December 30, 2018, 4:39am

Oh I see. I like the idea of putting things into a canonical form like this. And thanks for the pointer to Sugar.jl. This could be really helpful!

I think Cassette will do some great things, but as I understand it’s all about nonstandard execution. It would be nice to do some things statically and fall back on Cassette as needed. (Please correct me if I’m misunderstanding that)

For the “This is harder” case (referring to an externally-defined model), I’ve played a bit with the approach of dealing with v ~ dist differently based on the type of dist. But the way I was thinking of it had lots of exceptions, maybe standardizing the form somehow on a first pass would get around that.

Nice goal!

Here’s some Stan/Julia trivia not many people know… In 2009 there was a “Petascale Math” project with Andrew Gelman, @viralbshah, and me. At the time I had done some stats and some programming, but nothing Bayesian and nothing with compilers. We had some calls but didn’t seem to be making much progress. On one of these I suggested the route Stan ended up taking with regard to only evaluating log-densities up to an additive constant. But that was my only contribution to Stan - otherwise I was still stuck on getting Gibbs sampling to work.

Worried about progress, I managed to get a small subcontract for Galois (a Haskell shop in Portland) to implement the compiler ideas I had but didn’t understand well enough to do myself. This led to Passage. Around this time, I heard Viral mention something about Alan Edelman, whom I had never met. Looking back, I’m this may have been the very early stages of Julia (Is that right Viral?). And of course Andrew went on to lead the development of Stan.

Just my very small part in Stan/Julia history

Great! I’d love to hear any ideas you have. For some context, I’m off for another week, then a crazy week ending with me presenting Soss at PyData Miami. And then I’ll only be able to work on it now and again until my next sabbatical in the spring.

Azamat · December 30, 2018, 8:55pm

Thanks for the amazing work you do!
From a user perspective, it would be nice, if probabilistic programming packages would work towards a consistent interface.
For example, in Turing.jl one defines model as:

@model mymodel(x) = begin
	μ ~ Normal(0, 1.0)
	x ~ Normal(μ, 1.0)
end

whereas in Soss.jl:

mymodel = @model x begin
	μ ~ Normal(0, 1.0)
	x ~ Normal(μ, 1.0)
end

It would be nice if the two could agree on the interface so that user can swap backend without having to refactor code.

cscherrer · December 31, 2018, 12:47am

Thank you for the kind words! Especially because it’s at such an early stage, any appreciation like this means a lot to me

I agree this would be convenient. I’m not sure how critical Turing’s syntax is to their approach, but in Soss there’s at least some convenience in considering a model as an expression rather than an assignment. The latter forces a name to be bound, but it’s strange to me to consider the name itself to be part of the model.

Syntactically things are otherwise very similar, but I think this is still a special case. As I understand in Turing, a ~ can occur anywhere in the code. This is fine for latent variables, but it’s strange to even consider what could be meant by an observation may or may not occur, dependent on control flow. I’m not certain, but I’m guessing that in Turing this case might be up to a user to avoid.

In Soss, arbitrary control flow is a special case, but a huge number of models can be expressed declaratively with ~ only occurring at the top level. Some combinators like For already exist to make this format more powerful, and I expect more to come. My current thought is that a type hierarchy for models could be appropriate, with Turing-friendly models perhaps the most general case, but subtypes for special cases, for example Stan-like models or finite cases that are completely enumerable.

In any case, I think it should be relatively simple to connect Turing as a backend for Soss. Maybe this could lead to limitations of some sort, but there are none that I see to this point.

Topic		Replies	Views
Working with ASTs General Usage question	65	3309	January 30, 2019
New Soss interface - request for feedback Probabilistic Programming	8	2384	November 6, 2021
Julia calling Stan C++ directly Tooling	20	3562	April 21, 2019
Probabilistic programming with source transformations Statistics announcement	66	7385	July 2, 2018
JuliaCon 2020 Birds of a Feather Probabilistic Programming	24	3271	August 28, 2020

Compiler-like analysis in Julia

Related topics