Ridge Regression using Turing.jl

Hi everyone,

I’m trying to create a model in Turing.jl for Ridge Regression to use HMC. However, I need to sample two parameters, alpha and theta which are dependant on each other via

θ ~ MvNormal(zeros(3),α^(-1))
α ~ Gamma(γ+d/2,δ+norm(θ)^2)

(γ, δ, d are hyperparameters of the model)

I can’t find a way to do this in Turing.jl (compose @model) that can use this kind of sampling - I thought I could use some initial value for alpha or theta, but I don’t know how to do this for the first step.

I want to use Hamiltonian Monte Carlo and compare it to Gibbs sampler which I use as

iter = 1000
α = zeros(iter+1)
θ = zeros(iter+1,3)
α[1] = α0
θ[1,:] = θ0

for i in 1:iter
    E = (X'*X)^(-1)*X'*y
    D = Symmetric((X'*X + α[i]*I)^(-1))
    θ[i+1,:] = rand(MvNormal(E,D))

    a = δ + d/2
    b = γ + norm(θ[i+1,:])^2
    α[i+1] = rand(Gamma(a,b))
end

where X is a matrix of ones and vectors x1, x2 and y are target values. Any help would be appreciated. Thanks a lot.

2 Likes

This is a really interesting question. PPLs almost always express joint distributions in a way that ~ statements correspond to a factorization of the posterior density. So while a PPL might output a Gibbs sampler like you describe, they’re really not set up to take one as input.

So the more common way to do this would be to either sample θ and then α|θ, or first α and then θ|α. Unfortunately, to get there I think you’d need to crank through some things by hand, like factoring

f(\alpha) f(\theta | \alpha) = f(\theta) f(\alpha | \theta)

Both sides of this are equal to f(\alpha, \theta), and the result of this factorization would give you some options for feeding it to Turing.

BTW there’s also BayesianLinearRegression.jl that might be helpful. It’s very fast, but it’s focused on a marginal likelihood approach, so no gamma priors.

2 Likes

Do you have a reference for this phrasing of ridge regression? I’ve never seen it posed this way. Google yields some results, but I’d like to follow along.

It’s based on the approach described in Chris Bishop’s book

EDIT: Oh wait, I thought you were asking about the BayesianLinearRegression implementation. But maybe you meant @Michaela_Maskova’s Gibbs sampler representation?

1 Like

Yea I was more curious about the Gibbs Sampler. I’ve seen simpler PPL models people refer to as Ridge Regression, I want to understand more what this offers.

1 Like