How to understand GP extrapolation shape

using CairoMakie
using KernelFunctions
using AbstractGPs

n = 30
x_train  = rand(n) .+ 1
y_train = log.(x_train)

f = GP(Matern52Kernel())

fx = f(x_train, 0.1)

p_fx = posterior(fx, y_train)

fig = Figure()

xplot = collect(minimum(x_train):0.001:maximum(x_train))
ms = marginals(p_fx(xplot))
means = mean.(ms)

lines!(xplot, means, linewidth=10)
scatter!(x_train, y_train;color=:red)

xplot2 = collect(1:.001:5)
ms2 = marginals(p_fx(xplot2))
means2 = mean.(ms2)

lines!(xplot2, means2)

I am extrapolating this Matern kernel GP outside the range of the data. I want to understand why the extrapolation looks the way it does.

  • Why does it go back down after x=2?
  • Why does it asymptote at y=0?

By not passing an explicit mean to GP like you did in:

f = GP(Matern52Kernel())

you’re declaring “zero-mean” Gaussian process prior. See this line. Since your GP prior assumes zero-mean, the GP will converge towards 0 where there are not enough data points.


For any mean zero process it will eventually asymptote at zero, but if you wanted the extrapolation to “last” a bit longer before smoothly heading to zero, you might consider using a covariance function corresponding to a long memory process, which I think is most cleanly defined as a covariance function whose corresponding spectral density has a (integrable) singularity at the origin. Fractional ARIMA processes are an example, and I bet there is at least one time series package that does ARFIMA models.


Extrapolation in GPs is a sophisticated matter that need some special treatments. One usually need to use covariance kernels that are able to extrapolate as pointed out by @cgeoga . See for example this paper and this paper by Andrew Wilson which discusses a combination of kernel that works well for extrapolation in a lot of cases. Additional resources can be found in this webpage.

1 Like