Fitting multiple datasets simultaneously

I want to fit a model to multiple datasets simultaneously, with some fitted parameters being shared across datasets. Is it possible to do this using Turing.jl?

Would it work for you to make the model a sub model and then have a single model that declared whatever parameters are shared across models, defines one sub model for each data subset, and then conditions on all data?

Seth, that sounds reasonable. The question is how to implement it in Turing.jl. I’m not sure where to begin..

Here’s an example I made up for myself recently when I was trying to figure out how to do this. There are n = 5 datasets here that are quadratic regressions. The linear coefficient b and residual nosise σ are fitted independently for each dataset inside the Regression submodel, while the intercept a and quadratic coefficient c are defined at the top level in MultiRegression and passed as data to the submodels.

using Turing
using StatsPlots

n = 5
a = 1.0 + 2randn()
bb = 2.0 .+ 2randn(n)
c = 5randn()

xx = [2rand(100) for _ in 1:n]
yy = [a .+ b.*x .+ c.*x.^2 .+ randn.() for (b, x) in zip(bb, xx)]
scatter(xx, yy)

@model function Regression(x, y, pars)
    b ~ Normal()
    σ ~ Gamma()
    μ = pars.a .+ b .* x .+ pars.c .* x.^2
    y ~ MvNormal(μ, σ)
end

@model function MultiRegression(xx, yy)
    n = length(xx)
    a ~ Normal()
    c ~ Normal()

    stocks = Vector(undef, n)
    for i in 1:n
        pars = (; a, c)
        stocks[i] ~ to_submodel(Regression(xx[1], yy[i], pars))
    end
end

model = MultiRegression(xx, yy)

chain = sample(model, NUTS(), 1000)
plot(chain)

This seems to work pretty well, though there may be better ways to do it I don’t know about.

If the model is linear/polynomial then just set it up like so. From ElOceanografo’s example:

using LinearAlgebra

Xb = cat(xx..., dims=(1,2))
Xc = vcat(xx...)
Xa = ones(size(Xc, 1))
X = hcat(Xa, Xc.^2, Xb)
Y = vcat(yy...)

and solving with least squares:

# phat contains the estimate of [a;c;bb] 
phat = (X'X) \ (X'Y)

or I imagine as Bayesian regression in Turing.jl.