Implementing sparse Gaussian process model with Turing and Stheno

I’ve built a Turing model for some non-Gaussian data which contains a latent Gaussian process for spatial autocorrelation. I’ve run it on a small simulated dataset to confirm that it works. Now I want to run it on my real data–which includes about 11,000 data points, so impractical for a dense GP.

A simple MWE is below (the real model more complicated, but the essence is the same, i.e. GP => transformation => parameter of data distribution). Is there a simple(ish) way to modify this to use the sparse inducing-point approximations available in Stheno instead of the full GP?

using Turing, Stheno, Distributions, Random

Random.seed!(1)
x = 1.0:10.0
y = rand(GP(Matern32(), GPC())(x))
λ = exp.(y)
z = rand.(Poisson.(λ))

@model function example_model(x, z)
    y ~ GP(Matern32(), GPC())(x)
    λ = exp.(y)
    z .~ Poisson.(λ)
end

mod = example_model(x, z)
chn = sample(mod, NUTS(), 100)

@willtebbutt might know.

As an example of something I’ve tried that doesn’t work:

@model function example_model_sparse(x, z, xu)
    f = GP(Matern32(), GPC())
    y ~ MvNormal(zeros(length(x)), 2.0)
    Turing.@addlogprob! elbo(f(x, 1e-9), y, f(xu, 1e-9))
    λ = exp.(y)
    z .~ Poisson.(λ)
end

xu = 0:2:10 # Inducing points
mod_sparse = example_model_sparse(x, z, xu)
chn_sparse = sample(mod_sparse, NUTS(), 100)
ypost_sparse = Array(group(chn_sparse, :y))'

using Plots
plot(x, ypost_sparse, color=:black, alpha=0.5, label="")
plot!(x,  y, color=:red, linewidth=5, label="True y")

Which produces this:
image

In contrast, making the same plot with the chain generated using example_model above looks like this:

image

so the dense model appears to be generating a sensible posterior that tracks the true value of the latent field y.

I may be (probably am) misunderstanding what elbo is calculating, but the latent y variable doesn’t seem to be connected to the observed data here…

Hmmm this is a bit tricky. You’ve used the elbo calculation roughly correctly, but I don’t think that it’s doing what you want overall.

The basic issue is that you’ll need to somehow avoid computing the logpdf and instead compute the elbo. One approach / hack would be to introduce a wrapper type that eats a Stheno.AbstractGP and a FiniteGP representing the pseudo-points, and is itself another Stheno.AbstractGP which, when contained in a FiniteGP implements logpdf by calling the elbo of the GP that it wraps at the pseudo points that it wraps.

Does that suggestion makes sense?

Yes, that generally makes sense. So you’re thinking of something like this:

struct SparseFiniteGP{T1<:Stheno.FiniteGP, T2<:Stheno.FiniteGP} <: Stheno.AbstractGP
    fobs::T1
    finducing::T2
end

function Distributions.logpdf(f::T, y) where {T <: SparseFiniteGP}
    return elbo(f.fobs, y, f.finducing)
end

I tried using this new type in my Turing model as y ~ SparseFiniteGP(f(x), f(xu)). However, it complains that “ArgumentError: Right-hand side of a ~ must be subtype of Distribution or a vector of Distributions”.

Also, doesn’t Turing require that custom Distributions have a rand method?

What you’ve written is almost the thing that I had in mind. I had specifically imagined something along the lines of the following (modulo bugs, I’ve not tested this):


struct HackyGP{Tf<:AbstractGP, Tu<:FiniteGP} <: Stheno.AbstractGP
    f::Tf # process to be wrapped
    u::Tu # pseudo-points
end

function Stheno.logpdf(hacky_f::FiniteGP{<:HackyGP}, y::AbstractVector{<:Real})
    f = hacky_f.f.f
    u = hacky_f.f.u
    x = hacky_f.x
    Σy = hacky_f.Σy
    return Stheno.elbo(f(x, Σy), y, u)
end
1 Like

Okay, I came back to this and after some hacking have got the following to run and produce basically sensible results.

struct SparseFiniteGP{T1<:Stheno.FiniteGP, T2<:Stheno.FiniteGP} <: Distributions.AbstractMvNormal
    fobs::T1
    finducing::T2
end

function Stheno.logpdf(f::T, y::AbstractArray{T1, 1}) where {T <: SparseFiniteGP, T1 <: Real}
    return elbo(f.fobs, y, f.finducing)
end

Base.length(f::SparseFiniteGP) = length(f.fobs)
Distributions._rand!(rng::Random._GLOBAL_RNG, f::SparseFiniteGP1, x::Array{T}) where {T<:Real} = rand(f.fobs)

@model function example_model_sparse(x, z, xu)
    f = GP(Matern32(), GPC())
    fsparse = SparseFiniteGP(f(x, 1e-6),  f(xu, 1e-6))
    y ~ fsparse
    λ = exp.(y)
    z .~ Poisson.(λ)
end

chn_sparse = sample(example_model_sparse(x, z, xu), NUTS(), 200)

On the left is the dense model’s posterior for λ, on the right is the sparse model’s posterior (with the locations of the inducing points shown by the vertical lines).

image image

The posteriors do have some differences, but…with 900 observations, drawing 200 NUTS samples from the dense model takes something like 4 hours. Using the sparse model with 19 inducing points, the same sampling run takes about 60 seconds, i.e. a roughly 240-fold speedup.

@willtebbutt, do you think a type like this is worth having in Stheno? If so, I can start work on a PR…

4 Likes

This looks great, a PR would be very much welcome!

1 Like

Great! I’ve opened an issue to continue this discussion at the repo: https://github.com/willtebbutt/Stheno.jl/issues/134

3 Likes

Hope this isn’t too tangential, since it’s a usage question, but I can’t find examples anywhere but here or the docs.

If I understand Titsias and Bauer correctly (big if), then the VFE approximation implemented in Stheno is generally sensitive to the number of inducing points, but not the location of the inducing points.

Is there any reason to choose regularly indexed points from the training set rather than a random sample?