Gaussian Process Model with Turing

luiarthur · July 16, 2020, 1:34am

Hi @kevbonham, is this what you are looking for?

# Import Libraries
using Turing
using Turing: Variational
using Distributions
using Distances
using PyPlot
using StatsFuns
import Random
using Flux
import LinearAlgebra

# Squared-exponential covariance function
sqexp_cov_fn(D, phi, eps=1e-3) = exp.(-D^2 / phi) + LinearAlgebra.I * eps

# Exponential covariance function
exp_cov_fn(D, phi) = exp.(-D / phi)

@model function GP(y, X, m=0, s=1, cov_fn=exp_cov_fn)
    # Dimensions of predictors .
    N, P = size(X)
    
    # Distance matrix.
    D = pairwise(Distances.Euclidean(), X, dims=1)
    
    # Priors.
    mu ~ Normal(m, s)
    sig2 ~ LogNormal(0, 1)
    phi ~ LogNormal(0, 1)
    
    # Realized covariance function
    K = cov_fn(D, phi)
    
    # Sampling Distribution.
    # The latent variables have been marginalized out here,
    # so there's only one level.
    y ~ MvNormal(mu * ones(N), K + sig2 * LinearAlgebra.I(N))
end

Note that when N is large, this model will be really slow. There are several ways to speed up:

Use a low-rank approximation (e.g. predictive process). This can scale to potentially N=50000. I haven’t tried it myself in Turing, but I believe this should be doable in Turing. It’s not that bad to implement from scratch either.
GP for Big Data: [1309.6835] Gaussian Processes for Big Data

# Generate some non-linear data
N = 150
X = randn(N, 1) * 3
y = sin.(vec(X)) + randn(N) * 0.1
plt.scatter(vec(X), y)

# Fit via ADVI. You can also use HMC.
Random.seed!(0)
m = GP(y, X)
q0 = Variational.meanfield(m)  # initialize variational distribution (optional)
advi = ADVI(1, 2000)  # num_elbo_samples, max_iters
@time q = vi(m, advi, q0, optimizer=Flux.ADAM(1e-1));

# Alternatively, fit via HMC.
# Random.seed!(0)
# chain = sample(GP(y, X), HMC(0.01, 100), 200)

You need to write a bit of code to extract the posterior samples, make predictions, and make the plots; which could be difficult if you haven’t used Turing / implemented GP’s before. Here’s an image of the data I generated (dots), with posterior predictive mean at 100 new locations (red line), and the 95% credible region (red shaded region).

I have a complete example here which shows how to obtain predictions and generate the plot.

EDIT: After reading your post more carefully, I think what you really want, instead of a GP, is a carefully thought out mixed effects model, with random effects for each “subject”, and fixed effects for the obesity indicator, “timepoint”, and “nutrient” variables; the response variable being “bug”. You may need to transform “bug” if the range of values is not amenable to a Normal distribution. You’ll also have to prep the predictors (covariates). e.g. “subject” is categorical, so “1,2,3,4” should not be used directly in your design matrix X; instead you should use indicator variables to refer to each subject. ML people call this one-hot… Anyway, bottom-line, you probably don’t need a GP in this case. You may want to check out lmer4 in R for fitting mixed effects models.

Topic		Replies	Views
Modeling gaussian process with Turing.jl Probabilistic Programming gaussian-process	6	1222	February 8, 2022
Running TemporalGPs/Stheno sampling Probabilistic Programming gaussian-process , stheno	25	1333	May 29, 2021
Multi-level varying slopes with two clusters, cross classification Probabilistic Programming statistics , turing , distributions , bayesian-inference	16	2132	August 14, 2021
TemporalGPs precise zero estimate General Usage	7	272	October 14, 2022
Conceptual Gaussian Process Questions Probabilistic Programming	15	1764	August 17, 2020

Gaussian Process Model with Turing

Related topics