Shrinkage prior in MixedModels

question

#1

I would like to fit a hierarchical model

Y \sim X\alpha + Z\beta + \epsilon\\ \epsilon \sim N(0,\sigma^2_\epsilon) \text{ iid}\\ \alpha_i \sim N(\mu_\alpha, \sigma^2_\alpha)\\ \beta_i \sim N(\mu_\beta, \sigma^2_\beta)

using MixedModels (or of that does not work, a reasonable alternative). I just want a point estimate as a starting point for MCMC.

I just don’t know how to code the shrinkage using the @formula macro.

Some toy data (which of course will give very weak identification):

X = [1 0 0;
     1 1 0;
     1 1 0;
     1 1 1;
     1 1 1;
     1 1 1]
Z = [0 0 0;
     0.2 0 0;
     0.2 1 0;
     0.2 1.2 0;
     0.2 1.3 1;
     0.2 2 2]
α = [0, 1, 2]
β = [1, 2, 3]
Y = X*α .+ Z*β .+ randn()*0.01

Also, I wonder if I can avoid the DataFrame and just use matrices, as the real problem has 10-20 columns. There is always the same number of \alpha's and \beta's.


#2

The assumption in the MixedModels package is that the random effects model matrices are generated from the indicators of a categorical covariate. In the simplest case (a simple, scalar random effect) the columns of X and Z would be a set of indicators for two categorical covariates. That is not the case here.

You could create the internal representation used to evaluate the log-likelihood and piggy back on that mechanism or just emulate it. I use BlockArrays and some general representations of random-effects terms but you can skip that.

For a model like this you can “profile” the log-likelihood so it is a function of two parameters, the ratios of the standard deviations, λ₁ = σ₁/σ₃ and λ₂ = σ₂/σ₃ where, to save myself LaTeXing I translated α -> 1, β -> 2 and ε -> 3.

Start with the X’X type of matrix from hcat(X, Z, Y)

using LinearAlgebra, Random
X = [1 0 0;
     1 1 0;
     1 1 0;
     1 1 1;
     1 1 1;
     1 1 1]
Z = [0 0 0;
     0.2 0 0;
     0.2 1 0;
     0.2 1.2 0;
     0.2 1.3 1;
     0.2 2 2]
const n = size(Z, 1)
const α = [0, 1, 2]
const β = [1, 2, 3]
Y = X*α .+ Z*β .+ randn(MersenneTwister(1234321), n)*0.01
XZY = hcat(X, Z, Y);
const A = Symmetric(XZY'XZY, :L)
const L = copy(A.data);

Given λ₁ and λ₂ you form the Cholesky factor of a matrix that I write in shorthand form as Λ'AΛ + I, where
Λ is block diagonal in 3 diagonal blocks: λ₁I, λ₂I, and I of sizes length(α), length(β) and 1, respectively and the I is added only to the random-effects part. (In your case both α and β behave like random effects).

I would do it like this

function obj(λ)
    k = LinearAlgebra.checksquare(A)
    copyto!(L, A)
    m = λ[1]
    for j in eachindex(α)
        for i in 1:k
            L[i, j] *= m
            L[j, i] *= m
        end
        L[j, j] += 1
    end
    offset = length(α)
    m = λ[2]
    for j in eachindex(β)
        jj = offset + j
        for i in 1:k
            L[i, jj] *= m
            L[jj, i] *= m
        end
        L[jj, jj] += 1
    end
    cholesky!(Symmetric(L, :L))
    objective = 0.0
    for j in 1:(offset + length(β))  # evaluate the logarithm of the determinant
        objective += 2log(L[j, j])
    end
    objective += n * (1 + log(2π) + log(abs2(L[end,end])/n))
    objective
end

Then minimize this objective with respect to the two-dimensional λ, both components of which are constrained to be non-negative.