Running TemporalGPs/Stheno sampling

I want to try sampling with TemporalGPs because it may be faster on larger data, as mentioned in another thread. The docs say to use Stheno’s examples for reference so I tried that. I had some issues and questions:

  1. IIRC there used to be a Soss->Stheno->DynamicHMC example somewhere but I can’t find it anymore. Is it still possible to define these models via Soss?

  2. I tried writing with TemporalGPs->DynamicHMC using the Stheno example but I couldn’t figure out how to write/evaluate the prior density.

ERROR: LoadError: MethodError: no method matching logpdf(::GP{AbstractGPs.ZeroMean{Float64},ScaledKernel{TransformedKernel{Matern52Kernel,ScaleTransform{Float64}},Float64}}, ::Array{Float64,1})

using Random

# Load our GP-related packages.
using AbstractGPs
using KernelFunctions
using TemporalGPs
using Distributions


# Load standard packages from the Julia ecosystem
using Optim # Standard optimisation algorithms.
using ParameterHandling # Helper functionality for dealing with model parameters.
using Zygote # Algorithmic Differentiation

using ParameterHandling: value, flatten

using LogDensityProblems, DynamicHMC

# Declare model parameters using `ParameterHandling.jl` types.
params = (
    var_kernel = positive(0.6),
    λ = positive(2.5),
    var_noise = positive(0.1),
)

f_naive(params) = GP(params.var_kernel * transform(Matern52Kernel(), params.λ))

function build_gp(params)
    return to_sde(f_naive(params), SArrayStorage(Float64))
end

# Generate some synthetic data from the prior.
n = 100
const x = RegularSpacing(0.0, 0.1, n)
const y = rand(build_gp(value(params))(x, value(params.var_noise)))

# Construct mapping between structured and Vector representation of parameters.
flat_initial_params, unflatten = flatten(params)


function loglik_train(flat_params)
    params = value(unflatten(flat_params))
    f = build_gp(params)
    return logpdf(f(x, params.var_noise), y)
end


function gp_posterior(x, y, p)
    kernel = softplus(p[1]) * (Matern52Kernel() ∘ ScaleTransform(softplus(p[2])))
    f = GP(kernel)
    return posterior(f(x, 0.1), y)
end


logprior(params) = logpdf(build_gp(), params)

# Log joint density
function LogDensityProblems.logdensity(ℓ::typeof(loglik_train), params)
    return ℓ(params) + logprior(params)
end

# The parameter space is two-dimensional
LogDensityProblems.dimension(::typeof(loglik_train)) = 2

# `loglik_train` does not allow to evaluate derivatives of
# the log-likelihood function
function LogDensityProblems.capabilities(::Type{<:typeof(loglik_train)})
    return LogDensityProblems.LogDensityOrder{0}()
end

pgp = build_gp(value(params))
logpdf(f_naive(value(params)), flat_initial_params)


n_samples = 2
@time mcmc_result = mcmc_with_warmup(
        Random.GLOBAL_RNG,
        ADgradient(:ForwardDiff, loglik_train),
        n_samples,
)

mcmc_result
  1. Is it possible to use Soss to combine a TemporalGPs model with a linear regression and sample them as a single object?
  2. I tried copying the AdvancedHMC example out of the docs but got an error I don’t understand.

ERROR: LoadError: MethodError: no method matching extract_components(::Stheno.GaussianProcessProbabilisticProgramme{NamedTuple{(:f1, :f2, :f3),Tuple{Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}},Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}},Stheno.CompositeGP{Tuple{typeof(+),Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}},Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}}}}}}}, ::RegularSpacing{Float64})

using AbstractGPs, Stheno
using AdvancedHMC, Zygote, ParameterHandling
using Zygote: gradient
using ParameterHandling
using ParameterHandling: value, flatten

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))


θ = (
    # Short length-scale and small variance.
    l1 = positive(0.4),
    s1 = positive(0.2),

    # Long length-scale and larger variance.
    l2 = positive(5.0),
    s2 = positive(1.0),

    # Observation noise variance -- we'll be learning this as well.
    s_noise = positive(0.1),
)


function build_model(θ::NamedTuple)
    return @gppp let
        f1 = θ.s1 * stretch(GP(SEKernel()), 1 / θ.l1)
        f2 = θ.s2 * stretch(GP(SEKernel()), 1 / θ.l2)
        f3 = f1 + f2
    end
end


θ_flat_init, unflatten = flatten(θ);


# nlml = negative log marginal likelihood (of θ)
function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end

# Define the log marginal joint density function and its gradient
ℓπ(θ) = -nlml(θ) - 0.5 * sum(abs2, θ)
function ∂ℓπ∂θ(θ)
    lml, back = Zygote.pullback(ℓπ, θ)
    ∂θ = first(back(1.0))
    return lml, ∂θ
end

# Sampling parameter settings
n_samples, n_adapts = 500, 20

# Draw a random starting points
θ0 = randn(5)

# Define metric space, Hamiltonian, sampling method and adaptor
metric = DiagEuclideanMetric(5)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)
int = Leapfrog(find_good_eps(h, θ0))
prop = NUTS{MultinomialTS, GeneralisedNoUTurn}(int)
adaptor = StanHMCAdaptor(n_adapts, Preconditioner(metric), NesterovDualAveraging(0.8, int.ϵ))

# Perform inference.
samples, stats = sample(h, prop, θ0, n_samples, adaptor, n_adapts; progress=true)

# Inspect posterior distribution over hyperparameters.
hypers = unpack.(samples);
h_l1 = histogram(getindex.(hypers, :l1); label="l1");
h_l2 = histogram(getindex.(hypers, :l2); label="l2");
h_s1 = histogram(getindex.(hypers, :s1); label="s1");
h_s2 = histogram(getindex.(hypers, :s2); label="s2");
display(plot(h_l1, h_l2, h_s1, h_s2; layout=(2, 2)));

I’m interested in any thoughts or suggestions.

@willtebbutt could I ask your opinion on these if you have a moment?

Sorry, this slipped by me!

I tried writing with TemporalGPs->DynamicHMC using the Stheno example but I couldn’t figure out how to write/evaluate the prior density.

This is presumably the prior over the kernel parameters. Currently, it looks like you’ve got logprior set up to operate on the GP itself.

Perhaps something like:

sum(abs2, value(unflatten(flat_params)))

to give a log-Normal prior?

IIRC there used to be a Soss->Stheno->DynamicHMC example somewhere but I can’t find it anymore. Is it still possible to define these models via Soss?

This should indeed be possible – I moved the examples over to using ParameterHandling, but I’d like to have examples with both Soss and Turing.

I tried copying the AdvancedHMC example out of the docs but got an error I don’t understand.

What have you set x to be here?

I’d like to use DynamicHMC because it seems simpler to me but I’m not sure how to interpret your suggestion sum(abs2, value(unflatten(flat_params))): just defining

logprior(params) = sum(abs2, value(unflatten(params)))

doesn’t seem to do it.

In the AdvancedHMC example, I am assigning

 x = RegularSpacing(0.0, 0.1, 10_000)
using TemporalGPs
using AbstractGPs, Stheno
using AdvancedHMC, Zygote, ParameterHandling
using Zygote: gradient
using ParameterHandling
using ParameterHandling: value, flatten, value

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))




θ = (
    # Short length-scale and small variance.
    l1 = positive(0.4),
    s1 = positive(0.2),

    # Long length-scale and larger variance.
    l2 = positive(5.0),
    s2 = positive(1.0),

    # Observation noise variance -- we'll be learning this as well.
    s_noise = positive(0.1),
)



function build_gp(params)
    f_naive = GP(params.var_kernel * transform(Matern52Kernel(), params.λ))
    return to_sde(f_naive, SArrayStorage(Float64))
end

x,y = let
    params = (
        var_kernel = positive(0.6),
        λ = positive(2.5),
        var_noise = positive(0.1),
    )


# Generate some synthetic data from the prior.
    x = RegularSpacing(0.0, 0.1, 10_000)
    y = rand(build_gp(value(params))(x, value(params.var_noise)))
    (x,y)
end


function build_model(θ::NamedTuple)
    return @gppp let
        f1 = θ.s1 * stretch(GP(SEKernel()), 1 / θ.l1)
        f2 = θ.s2 * stretch(GP(SEKernel()), 1 / θ.l2)
        f3 = f1 + f2
    end
end


θ_flat_init, unflatten = flatten(θ);


# nlml = negative log marginal likelihood (of θ)
function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end

# Define the log marginal joint density function and its gradient
ℓπ(θ) = -nlml(θ) - 0.5 * sum(abs2, θ)
function ∂ℓπ∂θ(θ)
    lml, back = Zygote.pullback(ℓπ, θ)
    ∂θ = first(back(1.0))
    return lml, ∂θ
end

# Sampling parameter settings
n_samples, n_adapts = 500, 20

# Draw a random starting points
θ0 = randn(5)

# Define metric space, Hamiltonian, sampling method and adaptor
metric = DiagEuclideanMetric(5)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)
int = Leapfrog(find_good_eps(h, θ0))
prop = NUTS{MultinomialTS, GeneralisedNoUTurn}(int)
adaptor = StanHMCAdaptor(n_adapts, Preconditioner(metric), NesterovDualAveraging(0.8, int.ϵ))

# Perform inference.
samples, stats = sample(h, prop, θ0, n_samples, adaptor, n_adapts; progress=true)

# Inspect posterior distribution over hyperparameters.
hypers = unpack.(samples);
h_l1 = histogram(getindex.(hypers, :l1); label="l1");
h_l2 = histogram(getindex.(hypers, :l2); label="l2");
h_s1 = histogram(getindex.(hypers, :s1); label="s1");
h_s2 = histogram(getindex.(hypers, :s2); label="s2");
display(plot(h_l1, h_l2, h_s1, h_s2; layout=(2, 2)));

Perhaps I’d better wait till those are available; I’m burning too many cycles without their aid. Cheers.

Sorry, I forgot to suggest negating it:

logprior(params) = -sum(abs2, value(unflatten(params))) / 2

should be equivalent to placing a Normal prior with unit variance over each of the model parameters.

I’m slightly confused by the script you’ve provided – it looks like you’ve got a couple of different models floating around. What is it that you’re trying to achieve?

I was able to run the documentation example of Stheno with AdvancedHMC:

using Plots
plotlyjs()

using AdvancedHMC, Zygote


using ParameterHandling
using ParameterHandling: value, flatten


using ParameterHandling


# Import the packages we'll need for this bit of the demo.
using AbstractGPs
using Stheno

# Short length-scale and small variance.
l1 = 0.4
s1 = 0.2

# Long length-scale and larger variance.
l2 = 5.0
s2 = 1.0

# Specify a GaussianProcessProbabilisticProgramme object, which is itself a GP
# built from other GPs.
f = @gppp let
    f1 = s1 * stretch(GP(Matern52Kernel()), 1 / l1)
    f2 = s2 * stretch(GP(SEKernel()), 1 / l2)
    f3 = f1 + f2
end;

# Generate a sample from f3, one of the processes in f, at some random input locations.
# Add some iid observation noise, with zero-mean and variance 0.05.
N = 50
const x = GPPPInput(:f3, collect(range(-5.0, 5.0; length=N)));
σ²_n = 0.05;
fx = f(x, σ²_n);
const y = rand(fx);

# Compute the log marginal likelihood of this observation, just because we can.
logpdf(fx, y)

function build_model(θ::NamedTuple)
    return @gppp let
        f1 = θ.s1 * stretch(GP(SEKernel()), 1 / θ.l1)
        f2 = θ.s2 * stretch(GP(SEKernel()), 1 / θ.l2)
        f3 = f1 + f2
    end
end

θ = (
    # Short length-scale and small variance.
    l1 = positive(0.4),
    s1 = positive(0.2),

    # Long length-scale and larger variance.
    l2 = positive(5.0),
    s2 = positive(1.0),

    # Observation noise variance -- we'll be learning this as well.
    s_noise = positive(0.1),
)

θ_flat_init, unflatten = flatten(θ);

# Concrete types used for clarity only.
unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

# nlml = negative log marginal likelihood (of θ)
function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end

# Define the log marginal joint density function and its gradient
ℓπ(θ) = -nlml(θ) - 0.5 * sum(abs2, θ)
function ∂ℓπ∂θ(θ)
    lml, back = Zygote.pullback(ℓπ, θ)
    ∂θ = first(back(1.0))
    return lml, ∂θ
end

# Sampling parameter settings
n_samples, n_adapts = 500, 20

# Draw a random starting points
θ0 = randn(5)

# Define metric space, Hamiltonian, sampling method and adaptor
metric = DiagEuclideanMetric(5)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)
int = Leapfrog(find_good_eps(h, θ0))
prop = NUTS{MultinomialTS, GeneralisedNoUTurn}(int)
adaptor = StanHMCAdaptor(n_adapts, Preconditioner(metric), NesterovDualAveraging(0.8, int.ϵ))

# Perform inference.
samples, stats = sample(h, prop, θ0, n_samples, adaptor, n_adapts; progress=true)

# Inspect posterior distribution over hyperparameters.
hypers = unpack.(samples);
h_l1 = histogram(getindex.(hypers, :l1); label="l1");
h_l2 = histogram(getindex.(hypers, :l2); label="l2");
h_s1 = histogram(getindex.(hypers, :s1); label="s1");
h_s2 = histogram(getindex.(hypers, :s2); label="s2");
display(plot(h_l1, h_l2, h_s1, h_s2; layout=(2, 2)));

and also the documentation example of TemporalGPs with Optim:

# Load our GP-related packages.
using AbstractGPs
using KernelFunctions
using TemporalGPs

# Load standard packages from the Julia ecosystem
using Optim # Standard optimisation algorithms.
using ParameterHandling # Helper functionality for dealing with model parameters.
using Zygote # Algorithmic Differentiation

using ParameterHandling: value, flatten

# Declare model parameters using `ParameterHandling.jl` types.
params = (
    var_kernel = positive(0.6),
    λ = positive(2.5),
    var_noise = positive(0.1),
)

function build_gp(params)
    f_naive = GP(params.var_kernel * transform(Matern52Kernel(), params.λ))
    return to_sde(f_naive, SArrayStorage(Float64))
end

# Generate some synthetic data from the prior.
const x = RegularSpacing(0.0, 0.1, 10_000)
const y = rand(build_gp(value(params))(x, value(params.var_noise)))

# Construct mapping between structured and Vector representation of parameters.
flat_initial_params, unflatten = flatten(params)

# Specify an objective function for Optim to minimise in terms of x and y.
# We choose the usual negative log marginal likelihood (NLML).
function objective(flat_params)
    params = value(unflatten(flat_params))
    f = build_gp(params)
    return -logpdf(f(x, params.var_noise), y)
end

# Check that the objective function works:
objective(flat_initial_params)

# Optimise using Optim. This optimiser often works fairly well in practice,
# but it's not going to be the best choice in all situations. Consult
# Optim.jl for more info on available optimisers and their properties.
training_results = Optim.optimize(
    objective,
    θ -> only(Zygote.gradient(objective, θ)),
    flat_initial_params + randn(3), # Add some noise to make learning non-trivial
    BFGS(
        alphaguess = Optim.LineSearches.InitialStatic(scaled=true),
        linesearch = Optim.LineSearches.BackTracking(),
    ),
    Optim.Options(show_trace = true);
    inplace=false,
)

# Extracting the final values of the parameters.
# Should be close to truth.
final_params = value(unflatten(training_results.minimizer))

I would like to have an example of TemporalGPs with either AdvancedHMC or DynamicHMC so I can get the sampling-based uncertainty on a larger dataset than Stheno permits.

But as you said that’s already on the roadmap, I don’t think I need anything else. Thanks :slight_smile:

Update: I was able to run those examples I mentioned, but when I tried to use it on my real data I had some problems. The data is about 500-1000 observations per day for ~5000 days. The temporal granularity goes down only to the date level, so there are many observations with the same value for the temporal variable.

  • DomainError - logpdf tries to sqrt a negative, in Zygote pullback

This code reproduces the error.

# Load our GP-related packages.
using AbstractGPs
using KernelFunctions
using TemporalGPs

# Load standard packages from the Julia ecosystem
using Optim # Standard optimisation algorithms.
using ParameterHandling # Helper functionality for dealing with model parameters.
using Zygote # Algorithmic Differentiation

using ParameterHandling: value, flatten


# Declare model parameters using `ParameterHandling.jl` types.
params = (
    var_kernel = positive(5.0),
    λ = positive(2.5),
    var_noise = positive(2.0),
)



function build_gp(params)
    f_naive = GP(params.var_kernel * KernelFunctions.transform(Matern52Kernel(), params.λ))
    return to_sde(f_naive, SArrayStorage(Float64))
end




x = vcat([1,2,3,4,5], [1,2,3,4,5] )
y = randn(length(x))



flat_initial_params, unflatten = flatten(params)




# Specify an objective function for Optim to minimise in terms of x and y.
# We choose the usual negative log marginal likelihood (NLML).
function objective(flat_params)
    params = value(unflatten(flat_params))
    f = build_gp(params)
    return -logpdf(f(x, params.var_noise), y)
end

# Check that the objective function works:
# logpdf(build_gp(value(unflatten(flat_initial_params)))(x, params.var_noise), y)

# Optimise using Optim. This optimiser often works fairly well in practice,
# but it's not going to be the best choice in all situations. Consult
# Optim.jl for more info on available optimisers and their properties.
training_results = Optim.optimize(
    objective,
    θ -> only(Zygote.gradient(objective, θ)),
    flat_initial_params, # Add some noise to make learning non-trivial
    BFGS(
        alphaguess = Optim.LineSearches.InitialStatic(scaled=true),
        linesearch = Optim.LineSearches.BackTracking(),
    ),
    Optim.Options(show_trace = true);
    inplace=false,
)

# Extracting the final values of the parameters.
# Should be close to truth.
final_params = value(unflatten(training_results.minimizer))

Is handling multiple observations per period the issue that I’m running into here, or is it something else? How can I circumvent it?

Edit:

I tried

x = vcat([1,2,3,4,5], [1,2,3,4,5] ) 
x = x .+ randn(length(x))

which still produces the error, so I guess that’s not the problem.