Fitting multiple datasets simultaneously

Here’s an example I made up for myself recently when I was trying to figure out how to do this. There are n = 5 datasets here that are quadratic regressions. The linear coefficient b and residual nosise σ are fitted independently for each dataset inside the Regression submodel, while the intercept a and quadratic coefficient c are defined at the top level in MultiRegression and passed as data to the submodels.

using Turing
using StatsPlots

n = 5
a = 1.0 + 2randn()
bb = 2.0 .+ 2randn(n)
c = 5randn()

xx = [2rand(100) for _ in 1:n]
yy = [a .+ b.*x .+ c.*x.^2 .+ randn.() for (b, x) in zip(bb, xx)]
scatter(xx, yy)

@model function Regression(x, y, pars)
    b ~ Normal()
    σ ~ Gamma()
    μ = pars.a .+ b .* x .+ pars.c .* x.^2
    y ~ MvNormal(μ, σ)
end

@model function MultiRegression(xx, yy)
    n = length(xx)
    a ~ Normal()
    c ~ Normal()

    stocks = Vector(undef, n)
    for i in 1:n
        pars = (; a, c)
        stocks[i] ~ to_submodel(Regression(xx[1], yy[i], pars))
    end
end

model = MultiRegression(xx, yy)

chain = sample(model, NUTS(), 1000)
plot(chain)

This seems to work pretty well, though there may be better ways to do it I don’t know about.

1 Like