Gaussian Process Model with Turing

Full code here

(GPTool) pkg> st
Project GPTool v0.1.0
Status `~/repos/gptool/Project.toml`
  [99985d1d] AbstractGPs v0.2.6
  [c7e460c6] ArgParse v1.1.0
  [336ed68f] CSV v0.7.7
  [324d7699] CategoricalArrays v0.8.1
  [a93c6f00] DataFrames v0.21.7
  [b4f34e82] Distances v0.9.0
  [31c24e10] Distributions v0.23.10
  [91a5bcdd] Plots v1.6.1
  [fce5fe82] Turing v0.14.1
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [de0858da] Printf
  [10745b16] Statistics
using CSV
using DataFrames
using Turing
using Distributions
using Plots
using Distances
using LinearAlgebra
using AbstractGPs

df = DataFrame(
    subject = repeat(1:5, inner=3),
    obese = repeat(rand(Bool, 5), inner=3),
    timepoint = [1,2,3,1,3,4,1,2,5,1,4,5,1,3,5],
    bug = rand(Beta(0.9, 5), 15),
    nutrient = rand(Beta(0.9,5), 15)
)

## Straight Turing

sqexp_cov_fn(D, phi, eps=1e-3) = exp.(-D^2 / phi) + LinearAlgebra.I * eps

@model function myGP(y, Z, X, m=0, s=1, s_beta=3, cov_fn=sqexp_cov_fn)
    # Dimensions of GP predictors. Note that if X is a single column 
    # it needs to have dimension Nx1 (a matrix), not an array of 
    # length N, so that it can be processed properly by the distance function.
    N, P = size(X)
    
    # Dimensions of linear model predictors
    J = size(Z, 2)  # Z should be N x J
    
    # Distance matrix.
    D = pairwise(Distances.Euclidean(), X, dims=1)
    
    # Priors.
    mu ~ Normal(m, s)
    sig2 ~ LogNormal(0, 1)
    phi ~ LogNormal(0, 1)
    
    # Realized covariance function
    K = cov_fn(D, phi)
    
    # Prior for linear model coefficients
    beta ~ filldist(Normal(0, s_beta), J)

    # Sampling Distribution.
    y ~ MvNormal(mu * ones(N) + Z * beta, K + sig2 * LinearAlgebra.I(N))
end

gp = myGP(df.bug, Matrix(df[!,[:subject, :obese, :nutrient]]), Matrix(df[!,[:timepoint]]))
chain = sample(gp, HMC(0.01, 100), 200)

Tried both, on their own and together, same deal :-/

Separately, I’m trying to follow this example and running into other issues. Is there a better place to raise those?

This runs, but I haven’t looked at the results closely.

Note that I used KernelFunctions.jl, which implements the squared exponential covariance function previously written as sqexp_cov_fn.

Also cleaned up a few things. You’ll probably want to think about the priors for the application.

using CSV
using DataFrames
using Turing
using Distributions
using LinearAlgebra
using KernelFunctions

df = DataFrame(
    subject = repeat(1:5, inner=3),
    obese = repeat(rand(Bool, 5), inner=3),
    timepoint = [1,2,3,1,3,4,1,2,5,1,4,5,1,3,5],
    bug = rand(Beta(0.9, 5), 15),
    nutrient = rand(Beta(0.9,5), 15)
)

sekernel(alpha, rho) = 
  alpha^2 * KernelFunctions.transform(SEKernel(), sqrt(0.5)/rho)

@model function myGP(y, Z, X, jitter=1e-6)
    # Dimensions of GP predictors. Note that if X is a single column 
    # it needs to have dimension Nx1 (a matrix), not an array of 
    # length N, so that it can be processed properly by the distance function.
    N, P = size(X)
    
    # Dimensions of linear model predictors
    J = size(Z, 2)  # Z should be N x J
    
    # Priors.
    mu ~ Normal(0, 1)
    sig2 ~ LogNormal(0, 1)
    alpha ~ LogNormal(0, 0.1)
    rho ~ LogNormal(0, 1)

    # Prior for linear model coefficients
    beta ~ filldist(Normal(0, 1), J)
    
    # GP Covariance matrix
    kernel = sekernel(alpha, rho)  # covariance function
    K = kernelmatrix(kernel, X, obsdim=1)  # cov matrix
    K += LinearAlgebra.I * (sig2 + jitter)

    # Sampling Distribution.
    y ~ MvNormal(mu .+ Z * beta, K)
end

gp = myGP(df.bug,
          Matrix(df[!,[:subject, :obese, :nutrient]]),
          Matrix{Float64}(df[!,[:timepoint]]))

@time chain = sample(gp, HMC(0.01, 100), 200)
1 Like

Yeah, perhaps if you have issues with the example in the blog, try asking in the comments section (of the blog) for now.

1 Like

Confirmed, this now works for toy data, but is failing (PosDefException) on my real data :frowning:

If I’m reading your model correctly here, you are trying to include a GP to account for correlations among observations over time. Right now you have multiple observations at the same time point (check rank(D)), which is why you are not getting a positive definite covariance matrix. You could make it positive definite if you added a large enough variance to the diagonal, but that’s probably not what you want.

It might make more sense to model correlation over time within each subject. This will (implicitly) result in a block-diagonal covariance matrix that should be positive definite.

2 Likes

Ahh, yes this is true. But also true of toy data, no?

Yes, definitely - time-point 1 from subject 1 has basically no relationship to time-point 1 from subject 2. Is this what the one-hot encoding mentioned earlier is meant to address?

Yeah sorry. It’s fine to have multiple observations at the same time point if:

  1. they are unrelated (different subjects)
  2. there is observation error (adding variance to the diagonal)

The reference to one-hot encoding is separate from the GP. Right now you are passing a Z (design matrix) that looks like:

julia> Matrix(df[!, [:subject, :obese, :nutrient]])
 15×3 Array{Float64,2}:
 1.0  0.0  0.201721
 1.0  0.0  0.358846
 1.0  0.0  0.0457792
 2.0  0.0  0.217184
 2.0  0.0  0.0164159
 2.0  0.0  0.253808
 3.0  1.0  0.145758
 3.0  1.0  0.0136373
 3.0  1.0  0.328215
 4.0  0.0  0.002383
 4.0  0.0  0.334386
 4.0  0.0  0.0121662
 5.0  1.0  0.234525
 5.0  1.0  0.145446
 5.0  1.0  0.251321

Note that your first column is treating subject as continuous. So only one effect is estimated, and subject 5 experiences five times the effect of subject 1. One-hot encoding means that you have one column per subject, and rows have a single 1 value in the column corresponding to their subject id, and are otherwise zero.

The StatsModels package makes dealing with this easier, providing a formula interface. StatsModels calls one-hot encoding StatsModels.FullDummyCoding, not exported because other encodings (statisticians call them contrasts) are often more appropriate. If you want to use one-hot encoding you need to explicitly not include the intercept term or your design matrix (Z) will not be full rank. Using your data frame above,

mf = ModelFrame(@formula(bug ~ -1 + subject + obese + nutrient),
                df,
                contrasts = Dict(:subject => StatsModels.FullDummyCoding()))
Z = modelmatrix(mf)

Then Z looks like

  15×7 Array{Float64,2}:
 1.0  0.0  0.0  0.0  0.0  0.0  0.201721
 1.0  0.0  0.0  0.0  0.0  0.0  0.358846
 1.0  0.0  0.0  0.0  0.0  0.0  0.0457792
 0.0  1.0  0.0  0.0  0.0  0.0  0.217184
 0.0  1.0  0.0  0.0  0.0  0.0  0.0164159
 0.0  1.0  0.0  0.0  0.0  0.0  0.253808
 0.0  0.0  1.0  0.0  0.0  1.0  0.145758
 0.0  0.0  1.0  0.0  0.0  1.0  0.0136373
 0.0  0.0  1.0  0.0  0.0  1.0  0.328215
 0.0  0.0  0.0  1.0  0.0  0.0  0.002383
 0.0  0.0  0.0  1.0  0.0  0.0  0.334386
 0.0  0.0  0.0  1.0  0.0  0.0  0.0121662
 0.0  0.0  0.0  0.0  1.0  1.0  0.234525
 0.0  0.0  0.0  0.0  1.0  1.0  0.145446
 0.0  0.0  0.0  0.0  1.0  1.0  0.251321

So the first five columns encode the subject (and you’ll estimate one parameter for each). However, now there is an identifiability issue because the obesity column is the sum of the subject 3 and subject 5 columns (so the matrix is not full rank). Appropriate priors (maybe estimate one global mean and then put fairly strong priors on each subject’s difference from that mean) can help here.

1 Like

This is super helpful, thanks (this is really not my field, so a lot of the jargon is pretty opaque). The original data I fed in did not obey these constraints, but I tried filtering the data down to enforce them, and I’m still having the same error. Once point of clarification - are those two conditions OR or AND?

1 Like

Those are OR. If you model a GP for each subject separately then you’re saying that each subject follows an independent multivariate normal distribution (potentially sharing parameters like correlation length and marginal variance). If there is no observation error (also known as a ‘nugget’ in geostatistics), then the GP will interpolate the observations exactly, which isn’t possible if there are two different observations at the same location.

2 Likes

Hi @luiarthur,
I looked through your blogs and found them extremely helpful! Do you know of a method to add some “jitter” to the diagonal of the covariance matrix while still using the AbstractGP interface? I very much like the manual nature of just using Turing + KernelFunctions, but I also like the succinctness of the AbstractGP if I can get my hands on it. Also thanks for the blog posts! They certainly helped me piece together how GP’s fit together in a larger Bayesian modelling sense.

Thanks, @nucklass.

If I understand you correctly…

using KernelFunctions

k = SEKernel()
jitter = 1e-6
# adds jitter to diagonal of kernel matrix 
k +=  jitter * WhiteKernel()  
# `EyeKernel` is an alias for `WhiteKernel`
# So, `k += jitter * EyeKernel()` also works.

Note that jitter needs to be constant. If you instead want to learn a noise term, you’ll have to go about this a different way.

Also, slightly modified from my GP regression post,

using AbstractGPs, KernelFunctions, Distributions

# Define a parameterized kernel.
sqexpkernel(alpha::Real, rho::Real) = 
    alpha^2 * transform(SEKernel(), 1/(rho*sqrt(2)))

# Define model.
@model GPRegression(y, X, jitter=1e-6) = begin
    # Priors.
    alpha ~ LogNormal(0.0, 0.1)
    rho ~ LogNormal(0.0, 1.0)
    sigma ~ LogNormal(0.0, 1.0)
    
    # Covariance function.
    kernel = sqexpkernel(alpha, rho)

    # GP (implicit zero-mean).
    gp = GP(kernel)
    
    # Sampling Distribution
    y ~ gp(X, sigma^2 + jitter)  # add jitter for numerical stability.
end;

This version ends up being somewhat faster than if you were to add jitter * WhiteKernel() to your preferred kernel.

2 Likes

Ah that’s rather straightforward. I was running into some posdefexecption’s when I wasn’t expecting it and had read above adding a small constant to the diagonal could help out numeric stability. I ended up rewriting the model from scratch though, and circumventing the issue all together. Figured I should figure out how to do it anyway, though. Thanks for the quick reply ! You answered the question exactly! Also, go Banana Slugs!

1 Like

Nice post! FWIW, this works out of the box in Soss.jl:

# Define model.
m = @model X begin
    # Priors.
    alpha ~ LogNormal(0.0, 0.1)
    rho ~ LogNormal(0.0, 1.0)
    sigma ~ LogNormal(0.0, 1.0)
    
    # Covariance function.
    kernel = sqexpkernel(alpha, rho)

    # GP (implicit zero-mean).
    gp = GP(kernel)
    
    # Sampling Distribution (MvNormal likelihood).
    y ~ gp(X, sigma^2 + 1e-6)  # add 1e-6 for numerical stability.
end;

# Get some fake data
x = randn(20)
truth = rand(m(X=X))

# Sample from the posterior
y = truth.y
post = dynamicHMC(m(X=X), (;y=y));

Then

julia> ppost = particles(post)
(sigma = 0.141 ± 0.065, rho = 1.65 ± 0.49, alpha = 1.01 ± 0.096)
7 Likes

Blockquote
this works out of the box in Soss.jl:

Just tried this out! Its really slick and runs very quickly, to my (extremely thankful) surprise. One question, though. I notice that each feature set/observation in the data needs to be a column in the input array as opposed to the usual one-row-per-observation that Soss usually uses, or else I get a mismatched dimensions error. I’m assuming this has to do with the format that AbstractGPs/KernelFunctions generates inside the model.
Any idea if this is correct or am I just generating junk samples ?