Fit Gaussian process to training points

For self-study I am trying to fit a GP according to the formula in the GPML book

I am able to draw some random functions from the prior distribution, but when I try to condition on the data it seems to make no difference. The curves in the right plot should all intersect the blue scatter points but they don’t. Any suggestions what might be wrong?

using Distributions, LinearAlgebra, CairoMakie
CairoMakie.activate!()

"Exponentiated quadratic kernel"
expquad(x,y) = exp(-(1/2)*sum((x-y).^2))

"Add a small diagonal correction for numerical error to keep the covariance matrix positive definite."
ensure_posdef(a) = a + 3eps(opnorm(a))I

# Draw from the prior.
Xprior = reshape(collect(-5:.1:5), 1, :)
Nall = length(Xprior)
original_covar = map(Base.splat(expquad), Iterators.product(only(eachrow(Xprior)), only(eachrow(Xprior))))
covar = ensure_posdef(original_covar)
mvn = MvNormal(zeros(Nall), covar)
fig = Figure()
ax1 = Axis(fig[1,1])
foreach(1:3) do _
    r = rand(mvn)
    lines!(ax1, collect(keys(r)), r)
end
display(fig)

#=
> Invent some training data points
=#
Dtrain = [
    10.  40.  80.
     1.  -2.  -.5
]
Xtrain = Dtrain[1:1,:]
f = ytrain = Dtrain[2:2,:]' |> vec |> Vector
Xtest = reshape(collect(-5:.1:5), 1, :)

# Make test points.
make_ftest(Xtest, Xtrain, f) = let
    KXteXtr = [expquad(xte, xtr) for xte in eachcol(Xtest), xtr in eachcol(Xtrain)]
    KXtrXtr = [expquad(xtr, xtr′) for xtr in eachcol(Xtrain), xtr′ in eachcol(Xtrain)]
    KXteXte = [expquad(xte, xte′) for xte in eachcol(Xtest), xte′ in eachcol(Xtest)]
    KXtrXte = [expquad(xtr, xte) for xtr in eachcol(Xtrain), xte in eachcol(Xtest)]
    m = KXteXtr * inv(KXtrXtr) * f
    v = KXteXte - KXteXtr * inv(KXtrXtr) * KXtrXte
    MvNormal(m, ensure_posdef(v))
end
ftest = make_ftest(Xtest, Xtrain, f)
ax2 = Axis(fig[1,2])
foreach(1:5) do _
    r = rand(ftest)
    lines!(ax2, collect(keys(r)), r)
end
scatter!(ax2, Vector(vec(Xtrain')), f)
display(fig)

1 Like

The problem is that the training position in Xtrain are 10, 40 and 80, but Xtest evaluates the GP on [-5, 5]. Try again using

Dtrain = [
    1.  4.  8.
     1.  -2.  -.5
]

and plotting at the actual Xtest values instead of the indices:

foreach(1:5) do _
    r = rand(ftest)
    lines!(ax2, vec(Xtest), r)
end
scatter!(ax2, Vector(vec(Xtrain')), f)

Everything should work as expected then.

1 Like

Here is an approach.

using LinearAlgebra, Plots

# kernel
g(x,y) = exp(-0.5*(x-y)*(x-y)/4)
K(x,y) = [g(i,j) for i in x, j in y]

# mv normal variates
function mvnorm(μ, Σ, n)
    Σ .= 0.5*(Σ .+ Σ')
    val, vec = eigen(Σ)
    b = vec*diagm(sqrt.(max.(val,0.0)))
    z = randn(length(μ), n)
    return μ .+ b*z
end

# posterior mean, var
function pos(xr,yr,xt) 
    Ktr, Krr, Ktt = K(xt,xr), K(xr,xr), K(xt,xt)
    Krr_inv = pinv(Krr)
    μ = Ktr*Krr_inv*yr
    Σ = Ktt .- Ktr*Krr_inv*Ktr'
    return μ, Σ
end

# generate fake data
xt = collect(-2π:0.2:π)
xr = collect(-2*π:1.5:0)
yr = exp.(sin.(xr)) + 0.1*randn(length(xr))

# posterior
μ, Σ = pos(xr,yr,xt)
y = mvnorm(μ,Σ,3)
plot(xt, y, leg=false)
scatter!(xr, yr)

png

1 Like

I got an error. I tried using some other kernels from KernelFunctions.jl but that didn’t help.

ERROR: PosDefException: matrix is not Hermitian; Cholesky factorization failed.
Stacktrace:
 [1] checkpositivedefinite(info::Int64)
   @ LinearAlgebra /nix/store/ca4hhym3f57vpmhgvylvqp86cmz9gbis-julia-bin-1.8.1/share/julia/stdlib/v1.8/LinearAlgebra/src/factorization.jl:18
 [2] cholesky!(A::Matrix{Float64}, ::NoPivot; check::Bool)
   @ LinearAlgebra /nix/store/ca4hhym3f57vpmhgvylvqp86cmz9gbis-julia-bin-1.8.1/share/julia/stdlib/v1.8/LinearAlgebra/src/cholesky.jl:299
 [3] #cholesky#164
   @ /nix/store/ca4hhym3f57vpmhgvylvqp86cmz9gbis-julia-bin-1.8.1/share/julia/stdlib/v1.8/LinearAlgebra/src/cholesky.jl:402 [inlined]
 [4] cholesky (repeats 2 times)
   @ /nix/store/ca4hhym3f57vpmhgvylvqp86cmz9gbis-julia-bin-1.8.1/share/julia/stdlib/v1.8/LinearAlgebra/src/cholesky.jl:402 [inlined]
 [5] PDMat
   @ ~/.julia/packages/PDMats/ZW0lN/src/pdmat.jl:19 [inlined]
 [6] MvNormal
   @ ~/.julia/packages/Distributions/7iOJp/src/multivariate/mvnormal.jl:201 [inlined]
 [7] make_ftest(Xtest::Matrix{Float64}, Xtrain::Matrix{Float64}, ftrain::Vector{Float64})
   @ Main ~/src/gaussian-processes/ch02/ex01b.jl:62
 [8] top-level scope
   @ ~/src/gaussian-processes/ch02/ex01b.jl:64

using CairoMakie

"Exponentiated quadratic kernel"
expquad(x,y) = exp(-(1/2)*sum((x-y).^2))

kernel(x,y) = expquad(x,y)


Dtrain = [
     1.   4.   8.
     1.  -2. -.5
]
Xtrain = Dtrain[1:1,:]
ftrain = ytrain = Dtrain[2:2,:]' |> vec |> Vector
Xtest = reshape(collect(-5:.1:5), 1, :)


make_ftest(Xtest, Xtrain, ftrain) = let
    KXteXtr = [kernel(xte, xtr) for xte in eachcol(Xtest), xtr in eachcol(Xtrain)]
    KXtrXtr = [kernel(xtr, xtr′) for xtr in eachcol(Xtrain), xtr′ in eachcol(Xtrain)]
    KXteXte = [kernel(xte, xte′) for xte in eachcol(Xtest), xte′ in eachcol(Xtest)]
    KXtrXte = [kernel(xtr, xte) for xtr in eachcol(Xtrain), xte in eachcol(Xtest)]
    m = KXteXtr * inv(KXtrXtr) * ftrain
    v = KXteXte - KXteXtr * inv(KXtrXtr) * KXtrXte
    MvNormal(m, v)
end
ftest = make_ftest(Xtest, Xtrain, ftrain)
ax2 = Axis(fig[1,2])
foreach(1:5) do _
    r = rand(ftest)
    lines!(ax2, vec(Xtest), r)
end
scatter!(ax2, Vector(vec(Xtrain')), ftrain)
display(fig)

works in my script though.

# generate fake data
xt = collect(-5:0.1:5)
xr = [1,4,8]
yr = [1,-2,-0.5]

# posterior
μ, Σ = pos(xr,yr,xt)
y = mvnorm(μ,Σ,3)
plot(xt, y, leg=false)
scatter!(xr, yr)

png

1 Like

Sorry, had also changed ensure_posdef to

ensure_posdef(a) = 0.5*(a + a') + 3eps(opnorm(a))I

and used it when passing v to MvNorm (somehow you have removed it in your code).

In any case, there can be two ways why a matrix is not positive definite:

  1. Some eigenvalue can be below zero. This is fixed by adding a small positive perturbation to the diagonal. Most code on GPs I have seen, e.g., in Python or Stan, uses something like Σ + 1e-8*I.
  2. The matrix is not Hermitian as in your error. In case of real matrices A, this simply means that the matrix is not symmetric and can be fixed by using \frac{1}{2}(A + A^T) instead, i.e., as in my version of ensure_posdef above. Note that the code by @tchebycheff also uses that trick in the function mvnorm.

While the first problem always needs to fixed numerically, the deeper reason for the second one is that matrix inversion is numerically rather unstable and should be avoided. When you continue reading in the book, algorithm 2.1 will fix this issue by using the Cholesky decomposition instead. Here is how your function would read if using this algorithm:

make_ftest(Xtest, Xtrain, ftrain) = let
    # KXteXtr: not needed as same as KXtrXte'
    KXtrXtr = [kernel(xtr, xtr′) for xtr in eachcol(Xtrain), xtr′ in eachcol(Xtrain)]
    KXteXte = [kernel(xte, xte′) for xte in eachcol(Xtest), xte′ in eachcol(Xtest)]
    KXtrXte = [kernel(xtr, xte) for xtr in eachcol(Xtrain), xte in eachcol(Xtest)]
    L = cholesky(KXtrXtr).L
    alpha = L' \ (L \ ftrain)
    m = KXtrXte' * alpha
    v = L \ KXtrXte
    MvNormal(m, KXteXte - v' * v + 1e-8*I)  # still need to fix issue 1.
end
1 Like

I tried implementing it on a more complex kernel and data. When I ask the model for extrapolation slightly outside the range of the observed X values it gives Y values that are very far outside the observed range. I plot both the data alone and the GP fit to show this below. I’m guessing this is some kind of numerical issue but I don’t know how to diagnose.

using CairoMakie, StatsBase, Statistics, SplitApplyCombine, LinearAlgebra, KernelFunctions, Distributions, Tau

"Ensure the matrix is positive definite."
ensure_posdef(a) = let
    a = .5(a + a')
    # Add a small diagonal.
    d = 3eps(opnorm(a))I
    res = a + d
    isposdef(res) || error("failed to make positive definite")
    res
end

((kernel(x::T,y::T)::T) where T) = let
    k = sum([
        ConstantKernel()
        with_lengthscale(PeriodicKernel(), 365.25)
        with_lengthscale(Matern52Kernel(), 30)
    ])
    k(x,y)
end

#=
> Invent some training data points
=#

Xtrain = 736329.0:737790.0
ftrain = (20.0( sintau.(Xtrain / 365.25) + randn(length(Xtrain))) ) .+ 176.0
Xtest = 735964.0:738155.0

make_ftest(Xtest, Xtrain, ftrain) = let
    KXteXtr = product(kernel, Xtest, Xtrain)
    KXtrXtr = product(kernel, Xtrain, Xtrain)
    KXteXte = product(kernel, Xtest, Xtest)
    KXtrXte = product(kernel, Xtrain, Xtest)
    # Algorithm 2.1
    L = cholesky(KXtrXtr).L
    α = transpose(L) \ (L \ ftrain)
    m = KXteXtr * α
    v = (L \ KXtrXte)
    V = KXteXte - v'v
    MvNormal(m, ensure_posdef(V))
end

ftest = make_ftest(Xtest, Xtrain, ftrain)
fig = Figure()
ax = Axis(fig[1,1]; xminorgridvisible=true,yminorgridvisible=true)

band!(ax,
    vec(Xtest),
    mean(ftest) .- 2std.(var(ftest)),
    mean(ftest) .+ 2std.(var(ftest));
    color=(:gray, .25),
)
foreach(1:3) do _
    r = rand(ftest)
    lines!(ax, vec(Xtest), r)
end
scatter!(ax, Vector(vec(Xtrain')), ftrain; color=(:steelblue, .8), markersize=3)

display(fig)


That seems correct. Note that you fit without observation noise, i.e., the GP posterior has to pass through all your (rather random looking data) exactly. If you allow for observation noise – change the line KXtrXtr = product(kernel, Xtrain, Xtrain) .+ 1.0 * I(length(Xtrain)) in make_ftest – the result looks much better.

1 Like