Gaussian Process Model with Turing

Firstly, many apologies for taking so long to reply. So it looks like the following works just fine:

Turing.jl + Stheno.jl implementation

using DataFrames
using Distributions
using Random
using Stheno
using Turing
using Zygote

# Use Zygote for AD. Be sure to use 0.4.20 rather than 0.5.
Turing.setadbackend(:zygote)

# Generate fake data.
df = DataFrame(
    subject = repeat(1:5, inner=3),
    obese = repeat(rand(Bool, 5), inner=3),
    timepoint = [1,2,3,1,3,4,1,2,5,1,4,5,1,3,5],
    bug = rand(Beta(0.9, 5), 15),
    nutrient = rand(Beta(0.9,5), 15)
)

@model my_model(y, x) = begin

    # Length scales for each input dimension. Longer means dimension is less important.
    ls ~ Product(fill(Gamma(7, 1 / 2), 2))

    # Process variance parameter. Roughly speaking, how much of the variability in the data
    # is explained by the model.
    σ² ~ Gamma(7, 1 / 2)

    # Construct a GP object with the kernel of choice. Don't worry about the GPC thing.
    f = GP(σ² * stretch(Matern52(), 1 ./ ls), GPC())

    # Observation noise variance. Roughly speaking, how much of the data is unexplained by
    # the model / the model believes to be explained by "noise". 1 - σ²_n / (σ²_n + σ²) gives you
    # something a bit like an R^2 score.
    σ²_n ~ Gamma(7, 1 / 2)

    # Observations are `f` at `x` observed under noise `x`. `f(x, σ²_n)` is essentially a
    # multivariate Normal. Add 1e-3 for numerical stability.
    y ~ f(x, σ²_n + 1e-3)
end

# Pack data into a matrix and tell Stheno to treat it as a vector-of-vectors where
# each column is an element. Ideally this would be simpler, but I've to implement `RowVecs`
# in Stheno. I've only used two of the columns of the data here, but you could just as well have used more.
x = ColVecs(collect(hcat(df.obese, df.nutrient)'))

# Construct the model and run HMC for a bit.
chain = sample(my_model(df.bug, x), NUTS(), 1_000)

Make sure that you’re using Zygote 0.4.20 rather than 0.5 – it’s got a weird bug that I need to fix (that I think I may have caused in ChainRules :joy:).

I’ve placed reasonably informative priors over each of the parameters, which may- or may-not be appropriate for your data – they seem reasonable enough to me as defaults if you’ve not got anything crazy going on.

Modeling assumptions

Looks like there’s already been some good discussion about what you want to do in this thread, but I’ll give my 2 cents. In the above example implementation I’ve given example code for pretty much the simplest thing you would think to do with a GP, but it sounds like you somehow want to treat time differently to some of the other inputs.

Given that additive models appear to be on the table, do you believe that a model of the form

f(x, t) = f_t(t) + f_x(x)

is plausible for your data? (where x is all variables other than time). If so we could modify the above example to express that structure.

3 Likes