Gaussian Process Model with Turing

luiarthur · August 31, 2020, 10:24pm

This runs, but I haven’t looked at the results closely.

Note that I used KernelFunctions.jl, which implements the squared exponential covariance function previously written as sqexp_cov_fn.

Also cleaned up a few things. You’ll probably want to think about the priors for the application.

using CSV
using DataFrames
using Turing
using Distributions
using LinearAlgebra
using KernelFunctions

df = DataFrame(
    subject = repeat(1:5, inner=3),
    obese = repeat(rand(Bool, 5), inner=3),
    timepoint = [1,2,3,1,3,4,1,2,5,1,4,5,1,3,5],
    bug = rand(Beta(0.9, 5), 15),
    nutrient = rand(Beta(0.9,5), 15)
)

sekernel(alpha, rho) = 
  alpha^2 * KernelFunctions.transform(SEKernel(), sqrt(0.5)/rho)

@model function myGP(y, Z, X, jitter=1e-6)
    # Dimensions of GP predictors. Note that if X is a single column 
    # it needs to have dimension Nx1 (a matrix), not an array of 
    # length N, so that it can be processed properly by the distance function.
    N, P = size(X)
    
    # Dimensions of linear model predictors
    J = size(Z, 2)  # Z should be N x J
    
    # Priors.
    mu ~ Normal(0, 1)
    sig2 ~ LogNormal(0, 1)
    alpha ~ LogNormal(0, 0.1)
    rho ~ LogNormal(0, 1)

    # Prior for linear model coefficients
    beta ~ filldist(Normal(0, 1), J)
    
    # GP Covariance matrix
    kernel = sekernel(alpha, rho)  # covariance function
    K = kernelmatrix(kernel, X, obsdim=1)  # cov matrix
    K += LinearAlgebra.I * (sig2 + jitter)

    # Sampling Distribution.
    y ~ MvNormal(mu .+ Z * beta, K)
end

gp = myGP(df.bug,
          Matrix(df[!,[:subject, :obese, :nutrient]]),
          Matrix{Float64}(df[!,[:timepoint]]))

@time chain = sample(gp, HMC(0.01, 100), 200)