Hi @kevbonham, is this what you are looking for?
# Import Libraries
using Turing
using Turing: Variational
using Distributions
using Distances
using PyPlot
using StatsFuns
import Random
using Flux
import LinearAlgebra
# Squared-exponential covariance function
sqexp_cov_fn(D, phi, eps=1e-3) = exp.(-D^2 / phi) + LinearAlgebra.I * eps
# Exponential covariance function
exp_cov_fn(D, phi) = exp.(-D / phi)
@model function GP(y, X, m=0, s=1, cov_fn=exp_cov_fn)
# Dimensions of predictors .
N, P = size(X)
# Distance matrix.
D = pairwise(Distances.Euclidean(), X, dims=1)
# Priors.
mu ~ Normal(m, s)
sig2 ~ LogNormal(0, 1)
phi ~ LogNormal(0, 1)
# Realized covariance function
K = cov_fn(D, phi)
# Sampling Distribution.
# The latent variables have been marginalized out here,
# so there's only one level.
y ~ MvNormal(mu * ones(N), K + sig2 * LinearAlgebra.I(N))
end
Note that when N is large, this model will be really slow. There are several ways to speed up:
- Use a low-rank approximation (e.g. predictive process). This can scale to potentially N=50000. I haven’t tried it myself in Turing, but I believe this should be doable in Turing. It’s not that bad to implement from scratch either.
- GP for Big Data: [1309.6835] Gaussian Processes for Big Data
# Generate some non-linear data
N = 150
X = randn(N, 1) * 3
y = sin.(vec(X)) + randn(N) * 0.1
plt.scatter(vec(X), y)
# Fit via ADVI. You can also use HMC.
Random.seed!(0)
m = GP(y, X)
q0 = Variational.meanfield(m) # initialize variational distribution (optional)
advi = ADVI(1, 2000) # num_elbo_samples, max_iters
@time q = vi(m, advi, q0, optimizer=Flux.ADAM(1e-1));
# Alternatively, fit via HMC.
# Random.seed!(0)
# chain = sample(GP(y, X), HMC(0.01, 100), 200)
You need to write a bit of code to extract the posterior samples, make predictions, and make the plots; which could be difficult if you haven’t used Turing / implemented GP’s before. Here’s an image of the data I generated (dots), with posterior predictive mean at 100 new locations (red line), and the 95% credible region (red shaded region).
I have a complete example here which shows how to obtain predictions and generate the plot.
EDIT: After reading your post more carefully, I think what you really want, instead of a GP, is a carefully thought out mixed effects model, with random effects for each “subject”, and fixed effects for the obesity indicator, “timepoint”, and “nutrient” variables; the response variable being “bug”. You may need to transform “bug” if the range of values is not amenable to a Normal distribution. You’ll also have to prep the predictors (covariates). e.g. “subject” is categorical, so “1,2,3,4” should not be used directly in your design matrix X; instead you should use indicator variables to refer to each subject. ML people call this one-hot… Anyway, bottom-line, you probably don’t need a GP in this case. You may want to check out lmer4
in R for fitting mixed effects models.