This runs, but I haven’t looked at the results closely.
Note that I used KernelFunctions.jl
, which implements the squared exponential covariance function previously written as sqexp_cov_fn
.
Also cleaned up a few things. You’ll probably want to think about the priors for the application.
using CSV
using DataFrames
using Turing
using Distributions
using LinearAlgebra
using KernelFunctions
df = DataFrame(
subject = repeat(1:5, inner=3),
obese = repeat(rand(Bool, 5), inner=3),
timepoint = [1,2,3,1,3,4,1,2,5,1,4,5,1,3,5],
bug = rand(Beta(0.9, 5), 15),
nutrient = rand(Beta(0.9,5), 15)
)
sekernel(alpha, rho) =
alpha^2 * KernelFunctions.transform(SEKernel(), sqrt(0.5)/rho)
@model function myGP(y, Z, X, jitter=1e-6)
# Dimensions of GP predictors. Note that if X is a single column
# it needs to have dimension Nx1 (a matrix), not an array of
# length N, so that it can be processed properly by the distance function.
N, P = size(X)
# Dimensions of linear model predictors
J = size(Z, 2) # Z should be N x J
# Priors.
mu ~ Normal(0, 1)
sig2 ~ LogNormal(0, 1)
alpha ~ LogNormal(0, 0.1)
rho ~ LogNormal(0, 1)
# Prior for linear model coefficients
beta ~ filldist(Normal(0, 1), J)
# GP Covariance matrix
kernel = sekernel(alpha, rho) # covariance function
K = kernelmatrix(kernel, X, obsdim=1) # cov matrix
K += LinearAlgebra.I * (sig2 + jitter)
# Sampling Distribution.
y ~ MvNormal(mu .+ Z * beta, K)
end
gp = myGP(df.bug,
Matrix(df[!,[:subject, :obese, :nutrient]]),
Matrix{Float64}(df[!,[:timepoint]]))
@time chain = sample(gp, HMC(0.01, 100), 200)