Running TemporalGPs/Stheno sampling

I want to try sampling with TemporalGPs because it may be faster on larger data, as mentioned in another thread. The docs say to use Stheno’s examples for reference so I tried that. I had some issues and questions:

  1. IIRC there used to be a Soss->Stheno->DynamicHMC example somewhere but I can’t find it anymore. Is it still possible to define these models via Soss?

  2. I tried writing with TemporalGPs->DynamicHMC using the Stheno example but I couldn’t figure out how to write/evaluate the prior density.

ERROR: LoadError: MethodError: no method matching logpdf(::GP{AbstractGPs.ZeroMean{Float64},ScaledKernel{TransformedKernel{Matern52Kernel,ScaleTransform{Float64}},Float64}}, ::Array{Float64,1})

using Random

# Load our GP-related packages.
using AbstractGPs
using KernelFunctions
using TemporalGPs
using Distributions


# Load standard packages from the Julia ecosystem
using Optim # Standard optimisation algorithms.
using ParameterHandling # Helper functionality for dealing with model parameters.
using Zygote # Algorithmic Differentiation

using ParameterHandling: value, flatten

using LogDensityProblems, DynamicHMC

# Declare model parameters using `ParameterHandling.jl` types.
params = (
    var_kernel = positive(0.6),
    λ = positive(2.5),
    var_noise = positive(0.1),
)

f_naive(params) = GP(params.var_kernel * transform(Matern52Kernel(), params.λ))

function build_gp(params)
    return to_sde(f_naive(params), SArrayStorage(Float64))
end

# Generate some synthetic data from the prior.
n = 100
const x = RegularSpacing(0.0, 0.1, n)
const y = rand(build_gp(value(params))(x, value(params.var_noise)))

# Construct mapping between structured and Vector representation of parameters.
flat_initial_params, unflatten = flatten(params)


function loglik_train(flat_params)
    params = value(unflatten(flat_params))
    f = build_gp(params)
    return logpdf(f(x, params.var_noise), y)
end


function gp_posterior(x, y, p)
    kernel = softplus(p[1]) * (Matern52Kernel() ∘ ScaleTransform(softplus(p[2])))
    f = GP(kernel)
    return posterior(f(x, 0.1), y)
end


logprior(params) = logpdf(build_gp(), params)

# Log joint density
function LogDensityProblems.logdensity(ℓ::typeof(loglik_train), params)
    return ℓ(params) + logprior(params)
end

# The parameter space is two-dimensional
LogDensityProblems.dimension(::typeof(loglik_train)) = 2

# `loglik_train` does not allow to evaluate derivatives of
# the log-likelihood function
function LogDensityProblems.capabilities(::Type{<:typeof(loglik_train)})
    return LogDensityProblems.LogDensityOrder{0}()
end

pgp = build_gp(value(params))
logpdf(f_naive(value(params)), flat_initial_params)


n_samples = 2
@time mcmc_result = mcmc_with_warmup(
        Random.GLOBAL_RNG,
        ADgradient(:ForwardDiff, loglik_train),
        n_samples,
)

mcmc_result
  1. Is it possible to use Soss to combine a TemporalGPs model with a linear regression and sample them as a single object?
  2. I tried copying the AdvancedHMC example out of the docs but got an error I don’t understand.

ERROR: LoadError: MethodError: no method matching extract_components(::Stheno.GaussianProcessProbabilisticProgramme{NamedTuple{(:f1, :f2, :f3),Tuple{Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}},Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}},Stheno.CompositeGP{Tuple{typeof(+),Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}},Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}}}}}}}, ::RegularSpacing{Float64})

using AbstractGPs, Stheno
using AdvancedHMC, Zygote, ParameterHandling
using Zygote: gradient
using ParameterHandling
using ParameterHandling: value, flatten

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))


θ = (
    # Short length-scale and small variance.
    l1 = positive(0.4),
    s1 = positive(0.2),

    # Long length-scale and larger variance.
    l2 = positive(5.0),
    s2 = positive(1.0),

    # Observation noise variance -- we'll be learning this as well.
    s_noise = positive(0.1),
)


function build_model(θ::NamedTuple)
    return @gppp let
        f1 = θ.s1 * stretch(GP(SEKernel()), 1 / θ.l1)
        f2 = θ.s2 * stretch(GP(SEKernel()), 1 / θ.l2)
        f3 = f1 + f2
    end
end


θ_flat_init, unflatten = flatten(θ);


# nlml = negative log marginal likelihood (of θ)
function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end

# Define the log marginal joint density function and its gradient
ℓπ(θ) = -nlml(θ) - 0.5 * sum(abs2, θ)
function ∂ℓπ∂θ(θ)
    lml, back = Zygote.pullback(ℓπ, θ)
    ∂θ = first(back(1.0))
    return lml, ∂θ
end

# Sampling parameter settings
n_samples, n_adapts = 500, 20

# Draw a random starting points
θ0 = randn(5)

# Define metric space, Hamiltonian, sampling method and adaptor
metric = DiagEuclideanMetric(5)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)
int = Leapfrog(find_good_eps(h, θ0))
prop = NUTS{MultinomialTS, GeneralisedNoUTurn}(int)
adaptor = StanHMCAdaptor(n_adapts, Preconditioner(metric), NesterovDualAveraging(0.8, int.ϵ))

# Perform inference.
samples, stats = sample(h, prop, θ0, n_samples, adaptor, n_adapts; progress=true)

# Inspect posterior distribution over hyperparameters.
hypers = unpack.(samples);
h_l1 = histogram(getindex.(hypers, :l1); label="l1");
h_l2 = histogram(getindex.(hypers, :l2); label="l2");
h_s1 = histogram(getindex.(hypers, :s1); label="s1");
h_s2 = histogram(getindex.(hypers, :s2); label="s2");
display(plot(h_l1, h_l2, h_s1, h_s2; layout=(2, 2)));

I’m interested in any thoughts or suggestions.

@willtebbutt could I ask your opinion on these if you have a moment?

Sorry, this slipped by me!

I tried writing with TemporalGPs->DynamicHMC using the Stheno example but I couldn’t figure out how to write/evaluate the prior density.

This is presumably the prior over the kernel parameters. Currently, it looks like you’ve got logprior set up to operate on the GP itself.

Perhaps something like:

sum(abs2, value(unflatten(flat_params)))

to give a log-Normal prior?

IIRC there used to be a Soss->Stheno->DynamicHMC example somewhere but I can’t find it anymore. Is it still possible to define these models via Soss?

This should indeed be possible – I moved the examples over to using ParameterHandling, but I’d like to have examples with both Soss and Turing.

I tried copying the AdvancedHMC example out of the docs but got an error I don’t understand.

What have you set x to be here?

I’d like to use DynamicHMC because it seems simpler to me but I’m not sure how to interpret your suggestion sum(abs2, value(unflatten(flat_params))): just defining

logprior(params) = sum(abs2, value(unflatten(params)))

doesn’t seem to do it.

In the AdvancedHMC example, I am assigning

 x = RegularSpacing(0.0, 0.1, 10_000)
using TemporalGPs
using AbstractGPs, Stheno
using AdvancedHMC, Zygote, ParameterHandling
using Zygote: gradient
using ParameterHandling
using ParameterHandling: value, flatten, value

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))




θ = (
    # Short length-scale and small variance.
    l1 = positive(0.4),
    s1 = positive(0.2),

    # Long length-scale and larger variance.
    l2 = positive(5.0),
    s2 = positive(1.0),

    # Observation noise variance -- we'll be learning this as well.
    s_noise = positive(0.1),
)



function build_gp(params)
    f_naive = GP(params.var_kernel * transform(Matern52Kernel(), params.λ))
    return to_sde(f_naive, SArrayStorage(Float64))
end

x,y = let
    params = (
        var_kernel = positive(0.6),
        λ = positive(2.5),
        var_noise = positive(0.1),
    )


# Generate some synthetic data from the prior.
    x = RegularSpacing(0.0, 0.1, 10_000)
    y = rand(build_gp(value(params))(x, value(params.var_noise)))
    (x,y)
end


function build_model(θ::NamedTuple)
    return @gppp let
        f1 = θ.s1 * stretch(GP(SEKernel()), 1 / θ.l1)
        f2 = θ.s2 * stretch(GP(SEKernel()), 1 / θ.l2)
        f3 = f1 + f2
    end
end


θ_flat_init, unflatten = flatten(θ);


# nlml = negative log marginal likelihood (of θ)
function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end

# Define the log marginal joint density function and its gradient
ℓπ(θ) = -nlml(θ) - 0.5 * sum(abs2, θ)
function ∂ℓπ∂θ(θ)
    lml, back = Zygote.pullback(ℓπ, θ)
    ∂θ = first(back(1.0))
    return lml, ∂θ
end

# Sampling parameter settings
n_samples, n_adapts = 500, 20

# Draw a random starting points
θ0 = randn(5)

# Define metric space, Hamiltonian, sampling method and adaptor
metric = DiagEuclideanMetric(5)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)
int = Leapfrog(find_good_eps(h, θ0))
prop = NUTS{MultinomialTS, GeneralisedNoUTurn}(int)
adaptor = StanHMCAdaptor(n_adapts, Preconditioner(metric), NesterovDualAveraging(0.8, int.ϵ))

# Perform inference.
samples, stats = sample(h, prop, θ0, n_samples, adaptor, n_adapts; progress=true)

# Inspect posterior distribution over hyperparameters.
hypers = unpack.(samples);
h_l1 = histogram(getindex.(hypers, :l1); label="l1");
h_l2 = histogram(getindex.(hypers, :l2); label="l2");
h_s1 = histogram(getindex.(hypers, :s1); label="s1");
h_s2 = histogram(getindex.(hypers, :s2); label="s2");
display(plot(h_l1, h_l2, h_s1, h_s2; layout=(2, 2)));

Perhaps I’d better wait till those are available; I’m burning too many cycles without their aid. Cheers.

Sorry, I forgot to suggest negating it:

logprior(params) = -sum(abs2, value(unflatten(params))) / 2

should be equivalent to placing a Normal prior with unit variance over each of the model parameters.

I’m slightly confused by the script you’ve provided – it looks like you’ve got a couple of different models floating around. What is it that you’re trying to achieve?

I was able to run the documentation example of Stheno with AdvancedHMC:

using Plots
plotlyjs()

using AdvancedHMC, Zygote


using ParameterHandling
using ParameterHandling: value, flatten


using ParameterHandling


# Import the packages we'll need for this bit of the demo.
using AbstractGPs
using Stheno

# Short length-scale and small variance.
l1 = 0.4
s1 = 0.2

# Long length-scale and larger variance.
l2 = 5.0
s2 = 1.0

# Specify a GaussianProcessProbabilisticProgramme object, which is itself a GP
# built from other GPs.
f = @gppp let
    f1 = s1 * stretch(GP(Matern52Kernel()), 1 / l1)
    f2 = s2 * stretch(GP(SEKernel()), 1 / l2)
    f3 = f1 + f2
end;

# Generate a sample from f3, one of the processes in f, at some random input locations.
# Add some iid observation noise, with zero-mean and variance 0.05.
N = 50
const x = GPPPInput(:f3, collect(range(-5.0, 5.0; length=N)));
σ²_n = 0.05;
fx = f(x, σ²_n);
const y = rand(fx);

# Compute the log marginal likelihood of this observation, just because we can.
logpdf(fx, y)

function build_model(θ::NamedTuple)
    return @gppp let
        f1 = θ.s1 * stretch(GP(SEKernel()), 1 / θ.l1)
        f2 = θ.s2 * stretch(GP(SEKernel()), 1 / θ.l2)
        f3 = f1 + f2
    end
end

θ = (
    # Short length-scale and small variance.
    l1 = positive(0.4),
    s1 = positive(0.2),

    # Long length-scale and larger variance.
    l2 = positive(5.0),
    s2 = positive(1.0),

    # Observation noise variance -- we'll be learning this as well.
    s_noise = positive(0.1),
)

θ_flat_init, unflatten = flatten(θ);

# Concrete types used for clarity only.
unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

# nlml = negative log marginal likelihood (of θ)
function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end

# Define the log marginal joint density function and its gradient
ℓπ(θ) = -nlml(θ) - 0.5 * sum(abs2, θ)
function ∂ℓπ∂θ(θ)
    lml, back = Zygote.pullback(ℓπ, θ)
    ∂θ = first(back(1.0))
    return lml, ∂θ
end

# Sampling parameter settings
n_samples, n_adapts = 500, 20

# Draw a random starting points
θ0 = randn(5)

# Define metric space, Hamiltonian, sampling method and adaptor
metric = DiagEuclideanMetric(5)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)
int = Leapfrog(find_good_eps(h, θ0))
prop = NUTS{MultinomialTS, GeneralisedNoUTurn}(int)
adaptor = StanHMCAdaptor(n_adapts, Preconditioner(metric), NesterovDualAveraging(0.8, int.ϵ))

# Perform inference.
samples, stats = sample(h, prop, θ0, n_samples, adaptor, n_adapts; progress=true)

# Inspect posterior distribution over hyperparameters.
hypers = unpack.(samples);
h_l1 = histogram(getindex.(hypers, :l1); label="l1");
h_l2 = histogram(getindex.(hypers, :l2); label="l2");
h_s1 = histogram(getindex.(hypers, :s1); label="s1");
h_s2 = histogram(getindex.(hypers, :s2); label="s2");
display(plot(h_l1, h_l2, h_s1, h_s2; layout=(2, 2)));

and also the documentation example of TemporalGPs with Optim:

# Load our GP-related packages.
using AbstractGPs
using KernelFunctions
using TemporalGPs

# Load standard packages from the Julia ecosystem
using Optim # Standard optimisation algorithms.
using ParameterHandling # Helper functionality for dealing with model parameters.
using Zygote # Algorithmic Differentiation

using ParameterHandling: value, flatten

# Declare model parameters using `ParameterHandling.jl` types.
params = (
    var_kernel = positive(0.6),
    λ = positive(2.5),
    var_noise = positive(0.1),
)

function build_gp(params)
    f_naive = GP(params.var_kernel * transform(Matern52Kernel(), params.λ))
    return to_sde(f_naive, SArrayStorage(Float64))
end

# Generate some synthetic data from the prior.
const x = RegularSpacing(0.0, 0.1, 10_000)
const y = rand(build_gp(value(params))(x, value(params.var_noise)))

# Construct mapping between structured and Vector representation of parameters.
flat_initial_params, unflatten = flatten(params)

# Specify an objective function for Optim to minimise in terms of x and y.
# We choose the usual negative log marginal likelihood (NLML).
function objective(flat_params)
    params = value(unflatten(flat_params))
    f = build_gp(params)
    return -logpdf(f(x, params.var_noise), y)
end

# Check that the objective function works:
objective(flat_initial_params)

# Optimise using Optim. This optimiser often works fairly well in practice,
# but it's not going to be the best choice in all situations. Consult
# Optim.jl for more info on available optimisers and their properties.
training_results = Optim.optimize(
    objective,
    θ -> only(Zygote.gradient(objective, θ)),
    flat_initial_params + randn(3), # Add some noise to make learning non-trivial
    BFGS(
        alphaguess = Optim.LineSearches.InitialStatic(scaled=true),
        linesearch = Optim.LineSearches.BackTracking(),
    ),
    Optim.Options(show_trace = true);
    inplace=false,
)

# Extracting the final values of the parameters.
# Should be close to truth.
final_params = value(unflatten(training_results.minimizer))

I would like to have an example of TemporalGPs with either AdvancedHMC or DynamicHMC so I can get the sampling-based uncertainty on a larger dataset than Stheno permits.

But as you said that’s already on the roadmap, I don’t think I need anything else. Thanks :slight_smile:

Update: I was able to run those examples I mentioned, but when I tried to use it on my real data I had some problems. The data is about 500-1000 observations per day for ~5000 days. The temporal granularity goes down only to the date level, so there are many observations with the same value for the temporal variable.

  • DomainError - logpdf tries to sqrt a negative, in Zygote pullback

This code reproduces the error.

# Load our GP-related packages.
using AbstractGPs
using KernelFunctions
using TemporalGPs

# Load standard packages from the Julia ecosystem
using Optim # Standard optimisation algorithms.
using ParameterHandling # Helper functionality for dealing with model parameters.
using Zygote # Algorithmic Differentiation

using ParameterHandling: value, flatten


# Declare model parameters using `ParameterHandling.jl` types.
params = (
    var_kernel = positive(5.0),
    λ = positive(2.5),
    var_noise = positive(2.0),
)



function build_gp(params)
    f_naive = GP(params.var_kernel * KernelFunctions.transform(Matern52Kernel(), params.λ))
    return to_sde(f_naive, SArrayStorage(Float64))
end




x = vcat([1,2,3,4,5], [1,2,3,4,5] )
y = randn(length(x))



flat_initial_params, unflatten = flatten(params)




# Specify an objective function for Optim to minimise in terms of x and y.
# We choose the usual negative log marginal likelihood (NLML).
function objective(flat_params)
    params = value(unflatten(flat_params))
    f = build_gp(params)
    return -logpdf(f(x, params.var_noise), y)
end

# Check that the objective function works:
# logpdf(build_gp(value(unflatten(flat_initial_params)))(x, params.var_noise), y)

# Optimise using Optim. This optimiser often works fairly well in practice,
# but it's not going to be the best choice in all situations. Consult
# Optim.jl for more info on available optimisers and their properties.
training_results = Optim.optimize(
    objective,
    θ -> only(Zygote.gradient(objective, θ)),
    flat_initial_params, # Add some noise to make learning non-trivial
    BFGS(
        alphaguess = Optim.LineSearches.InitialStatic(scaled=true),
        linesearch = Optim.LineSearches.BackTracking(),
    ),
    Optim.Options(show_trace = true);
    inplace=false,
)

# Extracting the final values of the parameters.
# Should be close to truth.
final_params = value(unflatten(training_results.minimizer))

Is handling multiple observations per period the issue that I’m running into here, or is it something else? How can I circumvent it?

Edit:

I tried

x = vcat([1,2,3,4,5], [1,2,3,4,5] ) 
x = x .+ randn(length(x))

which still produces the error, so I guess that’s not the problem.

Sorry – I missed this update.

Could you try sorting the inputs? One of the TemporalGPs gotchas is that unsorted inputs can lead to silly results / errors.

@willtebbutt Sorting the inputs got rid of the error as you suggested using the BFGS optimizer.

Now I have some basic questions about the code and results:

  • what quantity does the plotted error band represent?

  • What is the difference between Matern52 and Matern52Kernel?

  • In the example above, the kernel is constructed with

params.var_kernel * KernelFunctions.transform(Matern52Kernel(), params.λ)

and the BFGS example shows how to construct another kernel for plotting, with

k = σ²_bfgs * stretch(Matern52(), 1 / l_bfgs);

Why do I need to create two kernels, instead of using the original kernel params.var_kernel * KernelFunctions.transform(Matern52Kernel(), params.λ) with the estimated parameters? I tried reusing the original kernel which is a ScaledKernel{TransformedKernel{Matern52Kernel,ScaleTransform{Float64}},Float64} but I guess the problem is that Plots.jl doesn’t know how to plot Stheno objects, only AbstractGPs objects? Do I need to duplicate my kernel in AbstractGPs objects in order to plot it?

Sorting the inputs got rid of the error as you suggested using the BFGS optimizer.

Excellent – glad to hear it.

what quantity does the plotted error band represent?

It’s whatever the defaut is in AbstractGPs, which I believe is 1 posterior standard deviation. You can change this via the ribbon_scale kwarg IIRC

  • What is the difference between Matern52 and Matern52Kernel ?

As of the most recent version of Stheno.jl, Matern52 should no longer exist, because Stheno.jl’s kernels have been dropped in favour of using those in KernelFunctions.jl. It and Matern52Kernel should implement the same thing.

I tried reusing the original kernel which is a ScaledKernel{TransformedKernel{Matern52Kernel,ScaleTransform{Float64}},Float64} but I guess the problem is that Plots.jl doesn’t know how to plot Stheno objects, only AbstractGPs objects?

Hmm that’s odd. Which version of Stheno.jl are you using? If you’re using 0.7 or above, this should work fine.

Do I need to duplicate my kernel in AbstractGPs objects in order to plot it?

If you’re trying to decompose your posterior into different components, you’ll need to implement it in Stheno.jl still. If you just want to plot the posterior predictions though, you should be fine to stay with the original kernel.

@willtebbutt I’ve upgraded to Stheno v0.7.5. This code

using Zygote: gradient

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

function build_model(params)
    f_naive = GP(kern(params) )
    return to_sde(f_naive, SArrayStorage(Float64))
end

function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end


θ = (
    s_kernel = positive(.1),
    λ = positive(0.001),
    s_noise = positive(4.0),
)
θ_flat_init, unflatten = flatten(θ);

kern(params) = params.s_kernel^2 * (Matern52Kernel() ∘ ScaleTransform(params.λ))


results = Optim.optimize(
    nlml,
    θ->gradient(nlml, θ)[1],
    θ_flat_init + randn(length(θ_flat_init)),
    BFGS(),
    Optim.Options(
        show_trace=true,
    );
    inplace=false,
)
θ_bfgs = unpack(results.minimizer);




f_bfgs = build_model(θ_bfgs);
f_posterior_bfgs = posterior(f_bfgs(x, θ_bfgs.s_noise), y);
xpred = -MAX_DAYS:0.01:MAX_DAYS
f_posterior_bfgs_pred = f_posterior_bfgs(xpred)

using AbstractGPsMakie, CairoMakie
fig = Figure()
ax = Axis(fig[1,1])

scene = plot!(xpred, f_posterior_bfgs_pred;)

fig

produces this plot

I’m pretty pleased with the performance as well.

Moving on, if you’ve got a moment…

I have a hypothesis that at t=0 there is a discrete jump down, not a smooth curve like it looks. I tried to modify the kernel to allow for a sudden jump at 0

struct SameSignKernel <: KernelFunctions.SimpleKernel end
(::SameSignKernel)(x, y) = sign(x) == sign(y)

kern(params) = params.s_kernel^2 * (Matern32Kernel() ∘ ScaleTransform(params.λ)) + SameSignKernel()

but I’ve done something wrong:

ERROR: LoadError: MethodError: no method matching stationary_distribution(::SameSignKernel, ::SArrayStorage{Float64})
  • Does this general idea make sense? Overall I’d like to estimate the size of the change at t=0 and the trajectory following that event.

  • How can I capture that sudden-jump in a kernel?

Glad to see it’s working.

I have a hypothesis that at t=0 there is a discrete jump down, not a smooth curve like it looks. I tried to modify the kernel to allow for a sudden jump at 0

The first thing I would recommend trying, if you know there’s a jump at 0 and you think that the value after 0 is essentially independent of the value beforehand, is just to fit two GPs but tie their hyperparameters. Does that make sense?

but I’ve done something wrong:

Yeah, what you’ve written won’t work because you need to tell TemporalGPs how to work with your kernel. This isn’t as straightforward as just specifying the covariance structure unfortunately – so I would recommend hacking a solution together for now. If the thing I suggested above doesn’t work out, we can get into thinking about how it might be possible to do what you’re after in a single kernel.

@willtebbutt

I think so. I’m creating two GPs and adding their nlml values together in lp_before + lp_after:

using Zygote: gradient

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

build_model(params) = to_sde(GP(kern(params)), SArrayStorage(Float64))



function nlml(θ_flat)
    θ = unpack(θ_flat)
    f_before = build_model(θ)
    f_after = build_model(θ)
    lp_before = -logpdf(f_before(df_before.t, θ.s_noise), df_before.resid)
    lp_after = -logpdf(f_after(df_after.t, θ.s_noise), df_after.resid)
    lp_before + lp_after
end


θ = (
    s_kernel = positive(.001),
    λ = positive(0.1),
    s_noise = positive(3.0),
)
θ_flat_init, unflatten = flatten(θ);



kern(params) = params.s_kernel^2 * (Matern32Kernel() ∘ ScaleTransform(params.λ))


results = Optim.optimize(
    nlml,
    θ->gradient(nlml, θ)[1],
    θ_flat_init + randn(length(θ_flat_init)),
    BFGS(),
    Optim.Options(
        show_trace=true,
    );
    inplace=false,
)
θ_bfgs = unpack(results.minimizer);

##


f_bfgs_before = build_model(θ_bfgs);
f_bfgs_after = build_model(θ_bfgs);
f_posterior_bfgs_before = posterior(f_bfgs_before(df_before.t, θ_bfgs.s_noise), df_before.t);
f_posterior_bfgs_after = posterior(f_bfgs_after(df_after.t, θ_bfgs.s_noise), df_after.t);
INCREMENT = 0.01
xpred_before = -MAX_DAYS:INCREMENT:0.
xpred_after = 1.:INCREMENT:MAX_DAYS
f_posterior_bfgs_pred_before = f_posterior_bfgs_before(xpred_before)
f_posterior_bfgs_pred_after = f_posterior_bfgs_after(xpred_after)
ms_bfgs_pred_before = marginals(f_posterior_bfgs_pred_before);

That raises

ERROR: LoadError: InexactError: Int64(1.8378770664093453)
Stacktrace:
 [1] Int64 at ./float.jl:710 [inlined]
 [2] convert at ./number.jl:7 [inlined]
 [3] posterior_and_lml(::TemporalGPs.Gaussian{StaticArrays.SArray{Tuple{2},Float64,1,2},StaticArrays.SArray{Tuple{2,2},Float64,2,4}}, ::TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},Float64,Float64}, ::Int64) at .julia/packages/TemporalGPs/lq0Zp/src/models/linear_gaussian_conditionals.jl:290
 [4] step_posterior(::TemporalGPs.Forward, ::TemporalGPs.Gaussian{StaticArrays.SArray{Tuple{2},Float64,1,2},StaticArrays.SArray{Tuple{2,2},Float64,2,4}}, ::Tuple{TemporalGPs.ElementOfLGSSM{TemporalGPs.Forward,TemporalGPs.SmallOutputLGC{StaticArrays.SArray{Tuple{2,2},Float64,2,4},FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}},StaticArrays.SArray{Tuple{2,2},Float64,2,4}},TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},Float64,Float64}},Int64}) at .julia/packages/TemporalGPs/lq0Zp/src/models/lgssm.jl:225
 [5] step_posterior at .julia/packages/TemporalGPs/lq0Zp/src/models/lgssm.jl:219 [inlined]
 [6] scan_emit at .julia/packages/TemporalGPs/lq0Zp/src/util/scan.jl:18 [inlined]
 [7] _a_bit_of_posterior at .julia/packages/TemporalGPs/lq0Zp/src/models/lgssm.jl:217 [inlined]
 [8] posterior(::TemporalGPs.LGSSM{TemporalGPs.GaussMarkovModel{TemporalGPs.Forward,Array{StaticArrays.SArray{Tuple{2,2},Float64,2,4},1},FillArrays.Fill{FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}},1,Tuple{Base.OneTo{Int64}}},Array{StaticArrays.SArray{Tuple{2,2},Float64,2,4},1},TemporalGPs.Gaussian{StaticArrays.SArray{Tuple{2},Float64,1,2},StaticArrays.SArray{Tuple{2,2},Float64,2,4}}},StructArrays.StructArray{TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},Float64,Float64},1,NamedTuple{(:A, :a, :Q),Tuple{FillArrays.Fill{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},1,Tuple{Base.OneTo{Int64}}},FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}},Array{Float64,1}}},Int64}}, ::Array{Int64,1}) at .julia/packages/TemporalGPs/lq0Zp/src/models/lgssm.jl:208
 [9] posterior(::TemporalGPs.LGSSM{TemporalGPs.GaussMarkovModel{TemporalGPs.Forward,Array{StaticArrays.SArray{Tuple{2,2},Float64,2,4},1},FillArrays.Fill{FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}},1,Tuple{Base.OneTo{Int64}}},Array{StaticArrays.SArray{Tuple{2,2},Float64,2,4},1},TemporalGPs.Gaussian{StaticArrays.SArray{Tuple{2},Float64,1,2},StaticArrays.SArray{Tuple{2,2},Float64,2,4}}},StructArrays.StructArray{TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},Float64,Float64},1,NamedTuple{(:A, :a, :Q),Tuple{FillArrays.Fill{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},1,Tuple{Base.OneTo{Int64}}},FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}},Array{Float64,1}}},Int64}}, ::Array{Union{Missing, Int64},1}) at .julia/packages/TemporalGPs/lq0Zp/src/models/missings.jl:24
 [10] marginals(::AbstractGPs.FiniteGP{TemporalGPs.PosteriorLTISDE{TemporalGPs.LTISDE{GP{AbstractGPs.ZeroMean{Float64},ScaledKernel{TransformedKernel{Matern32Kernel,ScaleTransform{Float64}},Float64}},SArrayStorage{Float64}},NamedTuple{(:y, :x, :Σy),Tuple{Array{Int64,1},Array{Int64,1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}}},StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}) at .julia/packages/TemporalGPs/lq0Zp/src/gp/posterior_lti_sde.jl:24
 [11] top-level scope at

Ah, I think your observations might be Ints? Could you convert them to Float64s and see if that fixes the problem?

@willtebbutt Changing the temporal variable from int to float fixed that error.

using Zygote: gradient

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

build_model(params) = to_sde(GP(kern(params)), SArrayStorage(Float64))



function nlml(θ_flat)
    θ = unpack(θ_flat)
    f_before = build_model(θ)
    f_after = build_model(θ)
    lp_before = -logpdf(f_before(df_before.t, θ.s_noise), df_before.resid)
    lp_after = -logpdf(f_after(df_after.t, θ.s_noise), df_after.resid)
    lp_before + lp_after
end


θ = (
    s_kernel = positive(.1),
    λ = positive(20.),
    s_noise = positive(3.0),
)
θ_flat_init, unflatten = flatten(θ);



kern(params) = params.s_kernel^2 * (Matern32Kernel() ∘ ScaleTransform(params.λ))


results = Optim.optimize(
    nlml,
    θ->gradient(nlml, θ)[1],
    θ_flat_init + randn(length(θ_flat_init)),
    BFGS(),
    Optim.Options(
        show_trace=true,
    );
    inplace=false,
)
θ_bfgs = unpack(results.minimizer);




f_bfgs_before = build_model(θ_bfgs);
f_bfgs_after = build_model(θ_bfgs);
f_posterior_bfgs_before = posterior(f_bfgs_before(df_before.t, θ_bfgs.s_noise), df_before.t);
f_posterior_bfgs_after = posterior(f_bfgs_after(df_after.t, θ_bfgs.s_noise), df_after.t);
INCREMENT = 0.01
xpred_before = -MAX_DAYS:INCREMENT:0.0 |> collect
xpred_after = 1.:INCREMENT:MAX_DAYS |> collect
f_posterior_bfgs_pred_before = f_posterior_bfgs_before(xpred_before)
f_posterior_bfgs_pred_after = f_posterior_bfgs_after(xpred_after)
ms_bfgs_pred_before = marginals(f_posterior_bfgs_pred_before);



using AbstractGPsMakie, CairoMakie
fig = Figure()
ax = Axis(fig[1,1])

scene = plot!(xpred_before, f_posterior_bfgs_pred_before;)
scene = plot!(xpred_after, f_posterior_bfgs_pred_after;)

fig

produces this plot

but it seems to be on the wrong scale (it should be varying by a few tenths, not tens), and the “after” block should be below, not above, the “before” block, as in my earlier post. I tried just running it multiple times and it does produce different results sometimes, but they seem even more off:

I tried changing the initial parameter values but they always produce one of these three plots.

Hmmm that is odd – it’s hard to say too much more without having your data etc. Would it be possible to provide a complete minimal working example that I could run locally to debug?

1 Like

@willtebbutt Here’s a self-contained example.

using Stheno
using Dates
using AbstractGPs
using KernelFunctions
using TemporalGPs
using Optim
using ParameterHandling
using Zygote
using Zygote: gradient
using ParameterHandling: value, flatten
using StatsModels
using MixedModels
using DataFrames

MAX_PERIODS = 50


df_periods = let n = 10000; sort(DataFrame(t=Float64.(rand((-MAX_PERIODS:MAX_PERIODS),n)), e=randn(n)), :t) end
df_periods = DataFrames.transform(df_periods, :t => ByRow(t-> t > 0 ? t/50 - 1.0 : 0) => :y)
df_periods.y .+= df_periods.e

import CairoMakie
fig = CairoMakie.Figure()
ax = CairoMakie.Axis(fig[1,1])
CairoMakie.scatter!(df_periods.t, mean.(df_periods.y), markersize=2, strokewidth=0)
fig



df_before = filter(x->x.t <= 0, df_periods)
df_after = filter(x->x.t > 0, df_periods)




unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

build_model(params) = to_sde(GP(kern(params)), SArrayStorage(Float64))



function nlml(θ_flat)
    θ = unpack(θ_flat)
    f_before = build_model(θ)
    f_after = build_model(θ)
    lp_before = -logpdf(f_before(df_before.t, θ.s_noise), df_before.y)
    lp_after = -logpdf(f_after(df_after.t, θ.s_noise), df_after.y)
    lp_before + lp_after
end


θ = (
    s_kernel = positive(.1),
    λ = positive(20.),
    s_noise = positive(3.0),
)
θ_flat_init, unflatten = flatten(θ);



kern(params) = params.s_kernel^2 * (Matern32Kernel() ∘ ScaleTransform(params.λ))


results = Optim.optimize(
    nlml,
    θ->gradient(nlml, θ)[1],
    θ_flat_init + randn(length(θ_flat_init)),
    BFGS(),
    Optim.Options(
        show_trace=true,
    );
    inplace=false,
)
θ_bfgs = unpack(results.minimizer);




f_bfgs_before = build_model(θ_bfgs);
f_bfgs_after = build_model(θ_bfgs);
f_posterior_bfgs_before = posterior(f_bfgs_before(df_before.t, θ_bfgs.s_noise), df_before.t);
f_posterior_bfgs_after = posterior(f_bfgs_after(df_after.t, θ_bfgs.s_noise), df_after.t);
INCREMENT = 0.01
xpred_before = -MAX_PERIODS:INCREMENT:0.0 |> collect
xpred_after = 1.:INCREMENT:MAX_PERIODS |> collect
f_posterior_bfgs_pred_before = f_posterior_bfgs_before(xpred_before)
f_posterior_bfgs_pred_after = f_posterior_bfgs_after(xpred_after)
ms_bfgs_pred_before = marginals(f_posterior_bfgs_pred_before);



using Plots
plotly()
p = plot()
plot!(p, f_posterior_bfgs_pred_before)
plot!(p, f_posterior_bfgs_pred_after)
p

The raw observations:

The GP:

Sometimes it raises an error

ERROR: LoadError: DomainError with -0.02933073033844247:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
 [1] throw_complex_domainerror(::Symbol, ::Float64) at /nix/store/lb43a78rgdbpfa198cymvxir92ns8hx4-julia-1.5.3/lib/julia/sys.so:?
 [2] sqrt at ./math.jl:573 [inlined]
 [3] posterior_and_lml(::TemporalGPs.Gaussian
...

or plots something weird instead:

Hmmm is it possible that the wrong data is being conditioned upon? Specifcally, it looks like time is being conditioned on in these lines:

f_posterior_bfgs_before = posterior(f_bfgs_before(df_before.t, θ_bfgs.s_noise), df_before.t);
f_posterior_bfgs_after = posterior(f_bfgs_after(df_after.t, θ_bfgs.s_noise), df_after.t);

Replacing it with y:

f_posterior_bfgs_before = posterior(f_bfgs_before(df_before.t, θ_bfgs.s_noise), df_before.y);
f_posterior_bfgs_after = posterior(f_bfgs_after(df_after.t, θ_bfgs.s_noise), df_after.y);

I get something like this:

1 Like

Ouch, you’re right, thanks for catching that.

@willtebbutt On my end I have to run it several times to get

Often it gives some kind of weird fit

or error

ERROR: LoadError: DomainError with -3.944864583570347e147:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
 [1] throw_complex_domainerror(::Symbol, ::Float64) at /nix/store/lb43a78rgdbpfa198cymvxir92ns8hx4-julia-1.5.3/lib/julia/sys.so:?
 [2] sqrt at ./math.jl:573 [inlined]
 [3] posterior_and_lml(::TemporalGPs.Gaussian{StaticArrays.SArray{Tuple{2},Float64,1,2},StaticArrays.SArray{Tuple{2,2},Float64,2,4}}, ::TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},Float64,Float64}, ::Float64)

Do you get the same? How can I avoid those weird results? It looks like your plot is much more jagged than mine.