[ANN] SossMLJ.jl - Probabilistic Programming for MLJ modeling

@dilumaluthge and I are pleased to announce the release of SossMLJ.jl, which makes it easy to use models from the Soss.jl probabilistic programming language in MLJ.jl.

Here’s a linear regression example, expressed as a special case of a generalized linear model. First,we import the necessary packages:

using Distributions
using MLJBase
using Soss
using SossMLJ
using Statistics

Next we write the model in Soss:

m = @model X, s, t begin
    p = size(X, 2) # number of features
    β ~ Normal(0, s) |> iid(p) # coefficients
    σ ~ HalfNormal(t) # dispersion
    η = X * β # linear predictor
    μ = η # `μ = g⁻¹(η) = η`
    y ~ For(eachindex(μ)) do j
        Normal(μ[j], σ) # `Yᵢ ~ Normal(mean=μᵢ, variance=σ²)`

Now let’s generate some synthetic features. We’ll have two continuous features and two binary categorical features.

num_rows = 1_000
x1 = randn(num_rows)
x2 = randn(num_rows)
x3 = Int.(rand(num_rows) .> 0.5)
x4 = Int.(rand(num_rows) .> 0.5)
X = (x1 = x1, x2 = x2, x3 = x3, x4 = x4)

In Soss, the arguments of a model are the values it needs to determine a joint distribution over parameters. At model definition time, Soss doesn’t consider “the variable that will later be observed” to be special in any way, so the “parameters” include this.

In the model m, the arguments are features X and hyperparameters s and t. Let’s set our hyperparameters…

hyperparams = (s=0.1, t=0.1)

Now we can express this as a SossMLJModel,

model = SossMLJModel(;
    model       = m,
    hyperparams = hyperparams,
    infer       = dynamicHMC,
    response    = :y,

Here infer can be any function that takes a model with arguments (called a JointDistribution in Soss) and returns samples, and response tells which variable we’ll be observing.

Note that X above is a NamedTuple (generally a Table, as expected by MLJ), but m takes a matrix. There’s another transform argument to SossMLJModel that handles this, with the default

@inline function default_transform(tbl)
    return (X = MLJModelInterface.matrix(tbl),)

This allows lots of flexibility in how the Soss model represents the features.

Let’s make some fake data to work with.

args = merge(model.transform(X), hyperparams)
truth = rand(m(args))

On this run, I get

julia> pairs(truth)
pairs(::NamedTuple) with 6 entries:
  :p => 4
  :η => [-0.0999251, 0.226782, -0.243349, 0.0821754, -0.281185, -0.0371795, 0.103831, -0.016029…
  :μ => [-0.0999251, 0.226782, -0.243349, 0.0821754, -0.281185, -0.0371795, 0.103831, -0.016029…
  :σ => 0.149136
  :β => [0.184801, -0.0151003, 0.135162, -0.25159]
  :y => [-0.170022, 0.0709445, -0.197551, -0.0813788, 0.195457, 0.0216646, 0.277798, -0.198569,…

Next we’ll give the model X and truth.y, and let it figure out the rest. In MLJ we fit models by creating a machine:

mach = MLJBase.machine(model, X, truth.y)

The fitresult for this model contains a post field that’s an array of named tuples. We can summarize it like this:

julia> particles(mach.fitresult.post)
(σ = 0.145 ± 0.0036,
 β = Particles{Float64,1000}[0.176 ± 0.005, -0.0169 ± 0.0049, 0.129 ± 0.0084, -0.249 ± 0.0088],)

Now for a given X (here we’ll just use the same one), we can get the posterior predictive distribution:

predictor_joint = predict_joint(mach, X)

predictor_joint is just a distribution we can sample from:

single_sample = rand(predictor_joint; response = :y)

It’s easy to evaluate the logpdf on any sample like this,

logpdf(predictor_joint, single_sample)

Instead of just one sample, it’s often useful to get particles, in the sense of MonteCarloMeasurements.jl. Also, note that here and above, we can change the response variable after fitting. So,

julia> predict_particles(mach, X; response = :β)
4-element Array{Particles{Float64,1000},1}:
  0.176 ± 0.005
 -0.0169 ± 0.0049
  0.129 ± 0.0084
 -0.249 ± 0.0088

If we know ground truth, this allow checks like

julia> truth.β - predict_particles(mach, X; response = :β)
4-element Array{Particles{Float64,1000},1}:
  0.00833 ± 0.005
  0.00182 ± 0.0049
  0.00666 ± 0.0084
 -0.00221 ± 0.0088

Or if we don’t, we can do posterior predictive checks against our observed y values. Here are the Bayesian p-values for this model:

julia> mean.(truth.y .< predict_particles(mach, X; response = :y)) |> particles
 0.498052 ± 0.285

There’s lots more we can do, check out the rest of this example.

Many thanks to @tlienart for helping me get up to speed on the MLJ interface and getting us started with a previous version of this library, and to @ablaom and the rest of the MLJ team for helpful discussions, and of course for the excellent MLJ framework.


this is awesome, well done to you both :raised_hands: