Running TemporalGPs/Stheno sampling

jzr · April 22, 2021, 6:34am

I want to try sampling with TemporalGPs because it may be faster on larger data, as mentioned in another thread. The docs say to use Stheno’s examples for reference so I tried that. I had some issues and questions:

IIRC there used to be a Soss->Stheno->DynamicHMC example somewhere but I can’t find it anymore. Is it still possible to define these models via Soss?
I tried writing with TemporalGPs->DynamicHMC using the Stheno example but I couldn’t figure out how to write/evaluate the prior density.

ERROR: LoadError: MethodError: no method matching logpdf(::GP{AbstractGPs.ZeroMean{Float64},ScaledKernel{TransformedKernel{Matern52Kernel,ScaleTransform{Float64}},Float64}}, ::Array{Float64,1})

using Random

# Load our GP-related packages.
using AbstractGPs
using KernelFunctions
using TemporalGPs
using Distributions


# Load standard packages from the Julia ecosystem
using Optim # Standard optimisation algorithms.
using ParameterHandling # Helper functionality for dealing with model parameters.
using Zygote # Algorithmic Differentiation

using ParameterHandling: value, flatten

using LogDensityProblems, DynamicHMC

# Declare model parameters using `ParameterHandling.jl` types.
params = (
    var_kernel = positive(0.6),
    λ = positive(2.5),
    var_noise = positive(0.1),
)

f_naive(params) = GP(params.var_kernel * transform(Matern52Kernel(), params.λ))

function build_gp(params)
    return to_sde(f_naive(params), SArrayStorage(Float64))
end

# Generate some synthetic data from the prior.
n = 100
const x = RegularSpacing(0.0, 0.1, n)
const y = rand(build_gp(value(params))(x, value(params.var_noise)))

# Construct mapping between structured and Vector representation of parameters.
flat_initial_params, unflatten = flatten(params)


function loglik_train(flat_params)
    params = value(unflatten(flat_params))
    f = build_gp(params)
    return logpdf(f(x, params.var_noise), y)
end


function gp_posterior(x, y, p)
    kernel = softplus(p[1]) * (Matern52Kernel() ∘ ScaleTransform(softplus(p[2])))
    f = GP(kernel)
    return posterior(f(x, 0.1), y)
end


logprior(params) = logpdf(build_gp(), params)

# Log joint density
function LogDensityProblems.logdensity(ℓ::typeof(loglik_train), params)
    return ℓ(params) + logprior(params)
end

# The parameter space is two-dimensional
LogDensityProblems.dimension(::typeof(loglik_train)) = 2

# `loglik_train` does not allow to evaluate derivatives of
# the log-likelihood function
function LogDensityProblems.capabilities(::Type{<:typeof(loglik_train)})
    return LogDensityProblems.LogDensityOrder{0}()
end

pgp = build_gp(value(params))
logpdf(f_naive(value(params)), flat_initial_params)


n_samples = 2
@time mcmc_result = mcmc_with_warmup(
        Random.GLOBAL_RNG,
        ADgradient(:ForwardDiff, loglik_train),
        n_samples,
)

mcmc_result

Is it possible to use Soss to combine a TemporalGPs model with a linear regression and sample them as a single object?
I tried copying the AdvancedHMC example out of the docs but got an error I don’t understand.


ERROR: LoadError: MethodError: no method matching extract_components(::Stheno.GaussianProcessProbabilisticProgramme{NamedTuple{(:f1, :f2, :f3),Tuple{Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}},Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}},Stheno.CompositeGP{Tuple{typeof(+),Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}},Stheno.CompositeGP{Tuple{typeof(*),Float64,Stheno.CompositeGP{Tuple{typeof(∘),Stheno.WrappedGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}},Stheno.Stretch{Float64}}}}}}}}}}, ::RegularSpacing{Float64})

using AbstractGPs, Stheno
using AdvancedHMC, Zygote, ParameterHandling
using Zygote: gradient
using ParameterHandling
using ParameterHandling: value, flatten

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))


θ = (
    # Short length-scale and small variance.
    l1 = positive(0.4),
    s1 = positive(0.2),

    # Long length-scale and larger variance.
    l2 = positive(5.0),
    s2 = positive(1.0),

    # Observation noise variance -- we'll be learning this as well.
    s_noise = positive(0.1),
)


function build_model(θ::NamedTuple)
    return @gppp let
        f1 = θ.s1 * stretch(GP(SEKernel()), 1 / θ.l1)
        f2 = θ.s2 * stretch(GP(SEKernel()), 1 / θ.l2)
        f3 = f1 + f2
    end
end


θ_flat_init, unflatten = flatten(θ);


# nlml = negative log marginal likelihood (of θ)
function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end

# Define the log marginal joint density function and its gradient
ℓπ(θ) = -nlml(θ) - 0.5 * sum(abs2, θ)
function ∂ℓπ∂θ(θ)
    lml, back = Zygote.pullback(ℓπ, θ)
    ∂θ = first(back(1.0))
    return lml, ∂θ
end

# Sampling parameter settings
n_samples, n_adapts = 500, 20

# Draw a random starting points
θ0 = randn(5)

# Define metric space, Hamiltonian, sampling method and adaptor
metric = DiagEuclideanMetric(5)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)
int = Leapfrog(find_good_eps(h, θ0))
prop = NUTS{MultinomialTS, GeneralisedNoUTurn}(int)
adaptor = StanHMCAdaptor(n_adapts, Preconditioner(metric), NesterovDualAveraging(0.8, int.ϵ))

# Perform inference.
samples, stats = sample(h, prop, θ0, n_samples, adaptor, n_adapts; progress=true)

# Inspect posterior distribution over hyperparameters.
hypers = unpack.(samples);
h_l1 = histogram(getindex.(hypers, :l1); label="l1");
h_l2 = histogram(getindex.(hypers, :l2); label="l2");
h_s1 = histogram(getindex.(hypers, :s1); label="s1");
h_s2 = histogram(getindex.(hypers, :s2); label="s2");
display(plot(h_l1, h_l2, h_s1, h_s2; layout=(2, 2)));

I’m interested in any thoughts or suggestions.

jzr · April 26, 2021, 9:57am

@willtebbutt could I ask your opinion on these if you have a moment?

willtebbutt · May 3, 2021, 2:30pm

Sorry, this slipped by me!

I tried writing with TemporalGPs->DynamicHMC using the Stheno example but I couldn’t figure out how to write/evaluate the prior density.

This is presumably the prior over the kernel parameters. Currently, it looks like you’ve got logprior set up to operate on the GP itself.

Perhaps something like:

sum(abs2, value(unflatten(flat_params)))

to give a log-Normal prior?

IIRC there used to be a Soss->Stheno->DynamicHMC example somewhere but I can’t find it anymore. Is it still possible to define these models via Soss?

This should indeed be possible – I moved the examples over to using ParameterHandling, but I’d like to have examples with both Soss and Turing.

I tried copying the AdvancedHMC example out of the docs but got an error I don’t understand.

What have you set x to be here?

jzr · May 5, 2021, 5:01am

I’d like to use DynamicHMC because it seems simpler to me but I’m not sure how to interpret your suggestion sum(abs2, value(unflatten(flat_params))): just defining

logprior(params) = sum(abs2, value(unflatten(params)))

doesn’t seem to do it.

In the AdvancedHMC example, I am assigning

 x = RegularSpacing(0.0, 0.1, 10_000)

using TemporalGPs
using AbstractGPs, Stheno
using AdvancedHMC, Zygote, ParameterHandling
using Zygote: gradient
using ParameterHandling
using ParameterHandling: value, flatten, value

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))




θ = (
    # Short length-scale and small variance.
    l1 = positive(0.4),
    s1 = positive(0.2),

    # Long length-scale and larger variance.
    l2 = positive(5.0),
    s2 = positive(1.0),

    # Observation noise variance -- we'll be learning this as well.
    s_noise = positive(0.1),
)



function build_gp(params)
    f_naive = GP(params.var_kernel * transform(Matern52Kernel(), params.λ))
    return to_sde(f_naive, SArrayStorage(Float64))
end

x,y = let
    params = (
        var_kernel = positive(0.6),
        λ = positive(2.5),
        var_noise = positive(0.1),
    )


# Generate some synthetic data from the prior.
    x = RegularSpacing(0.0, 0.1, 10_000)
    y = rand(build_gp(value(params))(x, value(params.var_noise)))
    (x,y)
end


function build_model(θ::NamedTuple)
    return @gppp let
        f1 = θ.s1 * stretch(GP(SEKernel()), 1 / θ.l1)
        f2 = θ.s2 * stretch(GP(SEKernel()), 1 / θ.l2)
        f3 = f1 + f2
    end
end


θ_flat_init, unflatten = flatten(θ);


# nlml = negative log marginal likelihood (of θ)
function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end

# Define the log marginal joint density function and its gradient
ℓπ(θ) = -nlml(θ) - 0.5 * sum(abs2, θ)
function ∂ℓπ∂θ(θ)
    lml, back = Zygote.pullback(ℓπ, θ)
    ∂θ = first(back(1.0))
    return lml, ∂θ
end

# Sampling parameter settings
n_samples, n_adapts = 500, 20

# Draw a random starting points
θ0 = randn(5)

# Define metric space, Hamiltonian, sampling method and adaptor
metric = DiagEuclideanMetric(5)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)
int = Leapfrog(find_good_eps(h, θ0))
prop = NUTS{MultinomialTS, GeneralisedNoUTurn}(int)
adaptor = StanHMCAdaptor(n_adapts, Preconditioner(metric), NesterovDualAveraging(0.8, int.ϵ))

# Perform inference.
samples, stats = sample(h, prop, θ0, n_samples, adaptor, n_adapts; progress=true)

# Inspect posterior distribution over hyperparameters.
hypers = unpack.(samples);
h_l1 = histogram(getindex.(hypers, :l1); label="l1");
h_l2 = histogram(getindex.(hypers, :l2); label="l2");
h_s1 = histogram(getindex.(hypers, :s1); label="s1");
h_s2 = histogram(getindex.(hypers, :s2); label="s2");
display(plot(h_l1, h_l2, h_s1, h_s2; layout=(2, 2)));

Perhaps I’d better wait till those are available; I’m burning too many cycles without their aid. Cheers.

willtebbutt · May 5, 2021, 10:13am

Sorry, I forgot to suggest negating it:

logprior(params) = -sum(abs2, value(unflatten(params))) / 2

should be equivalent to placing a Normal prior with unit variance over each of the model parameters.

I’m slightly confused by the script you’ve provided – it looks like you’ve got a couple of different models floating around. What is it that you’re trying to achieve?

jzr · May 5, 2021, 10:08pm

I was able to run the documentation example of Stheno with AdvancedHMC:

using Plots
plotlyjs()

using AdvancedHMC, Zygote


using ParameterHandling
using ParameterHandling: value, flatten


using ParameterHandling


# Import the packages we'll need for this bit of the demo.
using AbstractGPs
using Stheno

# Short length-scale and small variance.
l1 = 0.4
s1 = 0.2

# Long length-scale and larger variance.
l2 = 5.0
s2 = 1.0

# Specify a GaussianProcessProbabilisticProgramme object, which is itself a GP
# built from other GPs.
f = @gppp let
    f1 = s1 * stretch(GP(Matern52Kernel()), 1 / l1)
    f2 = s2 * stretch(GP(SEKernel()), 1 / l2)
    f3 = f1 + f2
end;

# Generate a sample from f3, one of the processes in f, at some random input locations.
# Add some iid observation noise, with zero-mean and variance 0.05.
N = 50
const x = GPPPInput(:f3, collect(range(-5.0, 5.0; length=N)));
σ²_n = 0.05;
fx = f(x, σ²_n);
const y = rand(fx);

# Compute the log marginal likelihood of this observation, just because we can.
logpdf(fx, y)

function build_model(θ::NamedTuple)
    return @gppp let
        f1 = θ.s1 * stretch(GP(SEKernel()), 1 / θ.l1)
        f2 = θ.s2 * stretch(GP(SEKernel()), 1 / θ.l2)
        f3 = f1 + f2
    end
end

θ = (
    # Short length-scale and small variance.
    l1 = positive(0.4),
    s1 = positive(0.2),

    # Long length-scale and larger variance.
    l2 = positive(5.0),
    s2 = positive(1.0),

    # Observation noise variance -- we'll be learning this as well.
    s_noise = positive(0.1),
)

θ_flat_init, unflatten = flatten(θ);

# Concrete types used for clarity only.
unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

# nlml = negative log marginal likelihood (of θ)
function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end

# Define the log marginal joint density function and its gradient
ℓπ(θ) = -nlml(θ) - 0.5 * sum(abs2, θ)
function ∂ℓπ∂θ(θ)
    lml, back = Zygote.pullback(ℓπ, θ)
    ∂θ = first(back(1.0))
    return lml, ∂θ
end

# Sampling parameter settings
n_samples, n_adapts = 500, 20

# Draw a random starting points
θ0 = randn(5)

# Define metric space, Hamiltonian, sampling method and adaptor
metric = DiagEuclideanMetric(5)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)
int = Leapfrog(find_good_eps(h, θ0))
prop = NUTS{MultinomialTS, GeneralisedNoUTurn}(int)
adaptor = StanHMCAdaptor(n_adapts, Preconditioner(metric), NesterovDualAveraging(0.8, int.ϵ))

# Perform inference.
samples, stats = sample(h, prop, θ0, n_samples, adaptor, n_adapts; progress=true)

# Inspect posterior distribution over hyperparameters.
hypers = unpack.(samples);
h_l1 = histogram(getindex.(hypers, :l1); label="l1");
h_l2 = histogram(getindex.(hypers, :l2); label="l2");
h_s1 = histogram(getindex.(hypers, :s1); label="s1");
h_s2 = histogram(getindex.(hypers, :s2); label="s2");
display(plot(h_l1, h_l2, h_s1, h_s2; layout=(2, 2)));

and also the documentation example of TemporalGPs with Optim:

# Load our GP-related packages.
using AbstractGPs
using KernelFunctions
using TemporalGPs

# Load standard packages from the Julia ecosystem
using Optim # Standard optimisation algorithms.
using ParameterHandling # Helper functionality for dealing with model parameters.
using Zygote # Algorithmic Differentiation

using ParameterHandling: value, flatten

# Declare model parameters using `ParameterHandling.jl` types.
params = (
    var_kernel = positive(0.6),
    λ = positive(2.5),
    var_noise = positive(0.1),
)

function build_gp(params)
    f_naive = GP(params.var_kernel * transform(Matern52Kernel(), params.λ))
    return to_sde(f_naive, SArrayStorage(Float64))
end

# Generate some synthetic data from the prior.
const x = RegularSpacing(0.0, 0.1, 10_000)
const y = rand(build_gp(value(params))(x, value(params.var_noise)))

# Construct mapping between structured and Vector representation of parameters.
flat_initial_params, unflatten = flatten(params)

# Specify an objective function for Optim to minimise in terms of x and y.
# We choose the usual negative log marginal likelihood (NLML).
function objective(flat_params)
    params = value(unflatten(flat_params))
    f = build_gp(params)
    return -logpdf(f(x, params.var_noise), y)
end

# Check that the objective function works:
objective(flat_initial_params)

# Optimise using Optim. This optimiser often works fairly well in practice,
# but it's not going to be the best choice in all situations. Consult
# Optim.jl for more info on available optimisers and their properties.
training_results = Optim.optimize(
    objective,
    θ -> only(Zygote.gradient(objective, θ)),
    flat_initial_params + randn(3), # Add some noise to make learning non-trivial
    BFGS(
        alphaguess = Optim.LineSearches.InitialStatic(scaled=true),
        linesearch = Optim.LineSearches.BackTracking(),
    ),
    Optim.Options(show_trace = true);
    inplace=false,
)

# Extracting the final values of the parameters.
# Should be close to truth.
final_params = value(unflatten(training_results.minimizer))

I would like to have an example of TemporalGPs with either AdvancedHMC or DynamicHMC so I can get the sampling-based uncertainty on a larger dataset than Stheno permits.

But as you said that’s already on the roadmap, I don’t think I need anything else. Thanks

jzr · May 6, 2021, 6:53am

Update: I was able to run those examples I mentioned, but when I tried to use it on my real data I had some problems. The data is about 500-1000 observations per day for ~5000 days. The temporal granularity goes down only to the date level, so there are many observations with the same value for the temporal variable.

DomainError - logpdf tries to sqrt a negative, in Zygote pullback

This code reproduces the error.

# Load our GP-related packages.
using AbstractGPs
using KernelFunctions
using TemporalGPs

# Load standard packages from the Julia ecosystem
using Optim # Standard optimisation algorithms.
using ParameterHandling # Helper functionality for dealing with model parameters.
using Zygote # Algorithmic Differentiation

using ParameterHandling: value, flatten


# Declare model parameters using `ParameterHandling.jl` types.
params = (
    var_kernel = positive(5.0),
    λ = positive(2.5),
    var_noise = positive(2.0),
)



function build_gp(params)
    f_naive = GP(params.var_kernel * KernelFunctions.transform(Matern52Kernel(), params.λ))
    return to_sde(f_naive, SArrayStorage(Float64))
end




x = vcat([1,2,3,4,5], [1,2,3,4,5] )
y = randn(length(x))



flat_initial_params, unflatten = flatten(params)




# Specify an objective function for Optim to minimise in terms of x and y.
# We choose the usual negative log marginal likelihood (NLML).
function objective(flat_params)
    params = value(unflatten(flat_params))
    f = build_gp(params)
    return -logpdf(f(x, params.var_noise), y)
end

# Check that the objective function works:
# logpdf(build_gp(value(unflatten(flat_initial_params)))(x, params.var_noise), y)

# Optimise using Optim. This optimiser often works fairly well in practice,
# but it's not going to be the best choice in all situations. Consult
# Optim.jl for more info on available optimisers and their properties.
training_results = Optim.optimize(
    objective,
    θ -> only(Zygote.gradient(objective, θ)),
    flat_initial_params, # Add some noise to make learning non-trivial
    BFGS(
        alphaguess = Optim.LineSearches.InitialStatic(scaled=true),
        linesearch = Optim.LineSearches.BackTracking(),
    ),
    Optim.Options(show_trace = true);
    inplace=false,
)

# Extracting the final values of the parameters.
# Should be close to truth.
final_params = value(unflatten(training_results.minimizer))

Is handling multiple observations per period the issue that I’m running into here, or is it something else? How can I circumvent it?

Edit:

I tried

x = vcat([1,2,3,4,5], [1,2,3,4,5] ) 
x = x .+ randn(length(x))

which still produces the error, so I guess that’s not the problem.

willtebbutt · May 17, 2021, 11:45am

Sorry – I missed this update.

Could you try sorting the inputs? One of the TemporalGPs gotchas is that unsorted inputs can lead to silly results / errors.

jzr · May 17, 2021, 10:45pm

@willtebbutt Sorting the inputs got rid of the error as you suggested using the BFGS optimizer.

Now I have some basic questions about the code and results:

what quantity does the plotted error band represent?
What is the difference between Matern52 and Matern52Kernel?
In the example above, the kernel is constructed with

params.var_kernel * KernelFunctions.transform(Matern52Kernel(), params.λ)

and the BFGS example shows how to construct another kernel for plotting, with

k = σ²_bfgs * stretch(Matern52(), 1 / l_bfgs);

Why do I need to create two kernels, instead of using the original kernel params.var_kernel * KernelFunctions.transform(Matern52Kernel(), params.λ) with the estimated parameters? I tried reusing the original kernel which is a ScaledKernel{TransformedKernel{Matern52Kernel,ScaleTransform{Float64}},Float64} but I guess the problem is that Plots.jl doesn’t know how to plot Stheno objects, only AbstractGPs objects? Do I need to duplicate my kernel in AbstractGPs objects in order to plot it?

willtebbutt · May 17, 2021, 11:23pm

Sorting the inputs got rid of the error as you suggested using the BFGS optimizer.

Excellent – glad to hear it.

what quantity does the plotted error band represent?

It’s whatever the defaut is in AbstractGPs, which I believe is 1 posterior standard deviation. You can change this via the ribbon_scale kwarg IIRC

What is the difference between Matern52 and Matern52Kernel ?

As of the most recent version of Stheno.jl, Matern52 should no longer exist, because Stheno.jl’s kernels have been dropped in favour of using those in KernelFunctions.jl. It and Matern52Kernel should implement the same thing.

I tried reusing the original kernel which is a ScaledKernel{TransformedKernel{Matern52Kernel,ScaleTransform{Float64}},Float64} but I guess the problem is that Plots.jl doesn’t know how to plot Stheno objects, only AbstractGPs objects?

Hmm that’s odd. Which version of Stheno.jl are you using? If you’re using 0.7 or above, this should work fine.

Do I need to duplicate my kernel in AbstractGPs objects in order to plot it?

If you’re trying to decompose your posterior into different components, you’ll need to implement it in Stheno.jl still. If you just want to plot the posterior predictions though, you should be fine to stay with the original kernel.

jzr · May 22, 2021, 5:02am

@willtebbutt I’ve upgraded to Stheno v0.7.5. This code

using Zygote: gradient

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

function build_model(params)
    f_naive = GP(kern(params) )
    return to_sde(f_naive, SArrayStorage(Float64))
end

function nlml(θ_flat)
    θ = unpack(θ_flat)
    f = build_model(θ)
    return -logpdf(f(x, θ.s_noise), y)
end


θ = (
    s_kernel = positive(.1),
    λ = positive(0.001),
    s_noise = positive(4.0),
)
θ_flat_init, unflatten = flatten(θ);

kern(params) = params.s_kernel^2 * (Matern52Kernel() ∘ ScaleTransform(params.λ))


results = Optim.optimize(
    nlml,
    θ->gradient(nlml, θ)[1],
    θ_flat_init + randn(length(θ_flat_init)),
    BFGS(),
    Optim.Options(
        show_trace=true,
    );
    inplace=false,
)
θ_bfgs = unpack(results.minimizer);




f_bfgs = build_model(θ_bfgs);
f_posterior_bfgs = posterior(f_bfgs(x, θ_bfgs.s_noise), y);
xpred = -MAX_DAYS:0.01:MAX_DAYS
f_posterior_bfgs_pred = f_posterior_bfgs(xpred)

using AbstractGPsMakie, CairoMakie
fig = Figure()
ax = Axis(fig[1,1])

scene = plot!(xpred, f_posterior_bfgs_pred;)

fig

produces this plot

I’m pretty pleased with the performance as well.

Moving on, if you’ve got a moment…

I have a hypothesis that at t=0 there is a discrete jump down, not a smooth curve like it looks. I tried to modify the kernel to allow for a sudden jump at 0

struct SameSignKernel <: KernelFunctions.SimpleKernel end
(::SameSignKernel)(x, y) = sign(x) == sign(y)

kern(params) = params.s_kernel^2 * (Matern32Kernel() ∘ ScaleTransform(params.λ)) + SameSignKernel()

but I’ve done something wrong:

ERROR: LoadError: MethodError: no method matching stationary_distribution(::SameSignKernel, ::SArrayStorage{Float64})

Does this general idea make sense? Overall I’d like to estimate the size of the change at t=0 and the trajectory following that event.
How can I capture that sudden-jump in a kernel?

willtebbutt · May 22, 2021, 1:32pm

Glad to see it’s working.

I have a hypothesis that at t=0 there is a discrete jump down, not a smooth curve like it looks. I tried to modify the kernel to allow for a sudden jump at 0

The first thing I would recommend trying, if you know there’s a jump at 0 and you think that the value after 0 is essentially independent of the value beforehand, is just to fit two GPs but tie their hyperparameters. Does that make sense?

but I’ve done something wrong:

Yeah, what you’ve written won’t work because you need to tell TemporalGPs how to work with your kernel. This isn’t as straightforward as just specifying the covariance structure unfortunately – so I would recommend hacking a solution together for now. If the thing I suggested above doesn’t work out, we can get into thinking about how it might be possible to do what you’re after in a single kernel.

jzr · May 22, 2021, 7:52pm

@willtebbutt

I think so. I’m creating two GPs and adding their nlml values together in lp_before + lp_after:

using Zygote: gradient

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

build_model(params) = to_sde(GP(kern(params)), SArrayStorage(Float64))



function nlml(θ_flat)
    θ = unpack(θ_flat)
    f_before = build_model(θ)
    f_after = build_model(θ)
    lp_before = -logpdf(f_before(df_before.t, θ.s_noise), df_before.resid)
    lp_after = -logpdf(f_after(df_after.t, θ.s_noise), df_after.resid)
    lp_before + lp_after
end


θ = (
    s_kernel = positive(.001),
    λ = positive(0.1),
    s_noise = positive(3.0),
)
θ_flat_init, unflatten = flatten(θ);



kern(params) = params.s_kernel^2 * (Matern32Kernel() ∘ ScaleTransform(params.λ))


results = Optim.optimize(
    nlml,
    θ->gradient(nlml, θ)[1],
    θ_flat_init + randn(length(θ_flat_init)),
    BFGS(),
    Optim.Options(
        show_trace=true,
    );
    inplace=false,
)
θ_bfgs = unpack(results.minimizer);

##


f_bfgs_before = build_model(θ_bfgs);
f_bfgs_after = build_model(θ_bfgs);
f_posterior_bfgs_before = posterior(f_bfgs_before(df_before.t, θ_bfgs.s_noise), df_before.t);
f_posterior_bfgs_after = posterior(f_bfgs_after(df_after.t, θ_bfgs.s_noise), df_after.t);
INCREMENT = 0.01
xpred_before = -MAX_DAYS:INCREMENT:0.
xpred_after = 1.:INCREMENT:MAX_DAYS
f_posterior_bfgs_pred_before = f_posterior_bfgs_before(xpred_before)
f_posterior_bfgs_pred_after = f_posterior_bfgs_after(xpred_after)
ms_bfgs_pred_before = marginals(f_posterior_bfgs_pred_before);

That raises

ERROR: LoadError: InexactError: Int64(1.8378770664093453)
Stacktrace:
 [1] Int64 at ./float.jl:710 [inlined]
 [2] convert at ./number.jl:7 [inlined]
 [3] posterior_and_lml(::TemporalGPs.Gaussian{StaticArrays.SArray{Tuple{2},Float64,1,2},StaticArrays.SArray{Tuple{2,2},Float64,2,4}}, ::TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},Float64,Float64}, ::Int64) at .julia/packages/TemporalGPs/lq0Zp/src/models/linear_gaussian_conditionals.jl:290
 [4] step_posterior(::TemporalGPs.Forward, ::TemporalGPs.Gaussian{StaticArrays.SArray{Tuple{2},Float64,1,2},StaticArrays.SArray{Tuple{2,2},Float64,2,4}}, ::Tuple{TemporalGPs.ElementOfLGSSM{TemporalGPs.Forward,TemporalGPs.SmallOutputLGC{StaticArrays.SArray{Tuple{2,2},Float64,2,4},FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}},StaticArrays.SArray{Tuple{2,2},Float64,2,4}},TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},Float64,Float64}},Int64}) at .julia/packages/TemporalGPs/lq0Zp/src/models/lgssm.jl:225
 [5] step_posterior at .julia/packages/TemporalGPs/lq0Zp/src/models/lgssm.jl:219 [inlined]
 [6] scan_emit at .julia/packages/TemporalGPs/lq0Zp/src/util/scan.jl:18 [inlined]
 [7] _a_bit_of_posterior at .julia/packages/TemporalGPs/lq0Zp/src/models/lgssm.jl:217 [inlined]
 [8] posterior(::TemporalGPs.LGSSM{TemporalGPs.GaussMarkovModel{TemporalGPs.Forward,Array{StaticArrays.SArray{Tuple{2,2},Float64,2,4},1},FillArrays.Fill{FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}},1,Tuple{Base.OneTo{Int64}}},Array{StaticArrays.SArray{Tuple{2,2},Float64,2,4},1},TemporalGPs.Gaussian{StaticArrays.SArray{Tuple{2},Float64,1,2},StaticArrays.SArray{Tuple{2,2},Float64,2,4}}},StructArrays.StructArray{TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},Float64,Float64},1,NamedTuple{(:A, :a, :Q),Tuple{FillArrays.Fill{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},1,Tuple{Base.OneTo{Int64}}},FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}},Array{Float64,1}}},Int64}}, ::Array{Int64,1}) at .julia/packages/TemporalGPs/lq0Zp/src/models/lgssm.jl:208
 [9] posterior(::TemporalGPs.LGSSM{TemporalGPs.GaussMarkovModel{TemporalGPs.Forward,Array{StaticArrays.SArray{Tuple{2,2},Float64,2,4},1},FillArrays.Fill{FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}},1,Tuple{Base.OneTo{Int64}}},Array{StaticArrays.SArray{Tuple{2,2},Float64,2,4},1},TemporalGPs.Gaussian{StaticArrays.SArray{Tuple{2},Float64,1,2},StaticArrays.SArray{Tuple{2,2},Float64,2,4}}},StructArrays.StructArray{TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},Float64,Float64},1,NamedTuple{(:A, :a, :Q),Tuple{FillArrays.Fill{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},1,Tuple{Base.OneTo{Int64}}},FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}},Array{Float64,1}}},Int64}}, ::Array{Union{Missing, Int64},1}) at .julia/packages/TemporalGPs/lq0Zp/src/models/missings.jl:24
 [10] marginals(::AbstractGPs.FiniteGP{TemporalGPs.PosteriorLTISDE{TemporalGPs.LTISDE{GP{AbstractGPs.ZeroMean{Float64},ScaledKernel{TransformedKernel{Matern32Kernel,ScaleTransform{Float64}},Float64}},SArrayStorage{Float64}},NamedTuple{(:y, :x, :Σy),Tuple{Array{Int64,1},Array{Int64,1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}}},StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}) at .julia/packages/TemporalGPs/lq0Zp/src/gp/posterior_lti_sde.jl:24
 [11] top-level scope at

willtebbutt · May 23, 2021, 10:09am

Ah, I think your observations might be Ints? Could you convert them to Float64s and see if that fixes the problem?

jzr · May 23, 2021, 7:29pm

@willtebbutt Changing the temporal variable from int to float fixed that error.

using Zygote: gradient

unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

build_model(params) = to_sde(GP(kern(params)), SArrayStorage(Float64))



function nlml(θ_flat)
    θ = unpack(θ_flat)
    f_before = build_model(θ)
    f_after = build_model(θ)
    lp_before = -logpdf(f_before(df_before.t, θ.s_noise), df_before.resid)
    lp_after = -logpdf(f_after(df_after.t, θ.s_noise), df_after.resid)
    lp_before + lp_after
end


θ = (
    s_kernel = positive(.1),
    λ = positive(20.),
    s_noise = positive(3.0),
)
θ_flat_init, unflatten = flatten(θ);



kern(params) = params.s_kernel^2 * (Matern32Kernel() ∘ ScaleTransform(params.λ))


results = Optim.optimize(
    nlml,
    θ->gradient(nlml, θ)[1],
    θ_flat_init + randn(length(θ_flat_init)),
    BFGS(),
    Optim.Options(
        show_trace=true,
    );
    inplace=false,
)
θ_bfgs = unpack(results.minimizer);




f_bfgs_before = build_model(θ_bfgs);
f_bfgs_after = build_model(θ_bfgs);
f_posterior_bfgs_before = posterior(f_bfgs_before(df_before.t, θ_bfgs.s_noise), df_before.t);
f_posterior_bfgs_after = posterior(f_bfgs_after(df_after.t, θ_bfgs.s_noise), df_after.t);
INCREMENT = 0.01
xpred_before = -MAX_DAYS:INCREMENT:0.0 |> collect
xpred_after = 1.:INCREMENT:MAX_DAYS |> collect
f_posterior_bfgs_pred_before = f_posterior_bfgs_before(xpred_before)
f_posterior_bfgs_pred_after = f_posterior_bfgs_after(xpred_after)
ms_bfgs_pred_before = marginals(f_posterior_bfgs_pred_before);



using AbstractGPsMakie, CairoMakie
fig = Figure()
ax = Axis(fig[1,1])

scene = plot!(xpred_before, f_posterior_bfgs_pred_before;)
scene = plot!(xpred_after, f_posterior_bfgs_pred_after;)

fig

produces this plot

but it seems to be on the wrong scale (it should be varying by a few tenths, not tens), and the “after” block should be below, not above, the “before” block, as in my earlier post. I tried just running it multiple times and it does produce different results sometimes, but they seem even more off:

I tried changing the initial parameter values but they always produce one of these three plots.

willtebbutt · May 24, 2021, 12:48pm

Hmmm that is odd – it’s hard to say too much more without having your data etc. Would it be possible to provide a complete minimal working example that I could run locally to debug?

jzr · May 26, 2021, 9:50pm

@willtebbutt Here’s a self-contained example.

using Stheno
using Dates
using AbstractGPs
using KernelFunctions
using TemporalGPs
using Optim
using ParameterHandling
using Zygote
using Zygote: gradient
using ParameterHandling: value, flatten
using StatsModels
using MixedModels
using DataFrames

MAX_PERIODS = 50


df_periods = let n = 10000; sort(DataFrame(t=Float64.(rand((-MAX_PERIODS:MAX_PERIODS),n)), e=randn(n)), :t) end
df_periods = DataFrames.transform(df_periods, :t => ByRow(t-> t > 0 ? t/50 - 1.0 : 0) => :y)
df_periods.y .+= df_periods.e

import CairoMakie
fig = CairoMakie.Figure()
ax = CairoMakie.Axis(fig[1,1])
CairoMakie.scatter!(df_periods.t, mean.(df_periods.y), markersize=2, strokewidth=0)
fig



df_before = filter(x->x.t <= 0, df_periods)
df_after = filter(x->x.t > 0, df_periods)




unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))

build_model(params) = to_sde(GP(kern(params)), SArrayStorage(Float64))



function nlml(θ_flat)
    θ = unpack(θ_flat)
    f_before = build_model(θ)
    f_after = build_model(θ)
    lp_before = -logpdf(f_before(df_before.t, θ.s_noise), df_before.y)
    lp_after = -logpdf(f_after(df_after.t, θ.s_noise), df_after.y)
    lp_before + lp_after
end


θ = (
    s_kernel = positive(.1),
    λ = positive(20.),
    s_noise = positive(3.0),
)
θ_flat_init, unflatten = flatten(θ);



kern(params) = params.s_kernel^2 * (Matern32Kernel() ∘ ScaleTransform(params.λ))


results = Optim.optimize(
    nlml,
    θ->gradient(nlml, θ)[1],
    θ_flat_init + randn(length(θ_flat_init)),
    BFGS(),
    Optim.Options(
        show_trace=true,
    );
    inplace=false,
)
θ_bfgs = unpack(results.minimizer);




f_bfgs_before = build_model(θ_bfgs);
f_bfgs_after = build_model(θ_bfgs);
f_posterior_bfgs_before = posterior(f_bfgs_before(df_before.t, θ_bfgs.s_noise), df_before.t);
f_posterior_bfgs_after = posterior(f_bfgs_after(df_after.t, θ_bfgs.s_noise), df_after.t);
INCREMENT = 0.01
xpred_before = -MAX_PERIODS:INCREMENT:0.0 |> collect
xpred_after = 1.:INCREMENT:MAX_PERIODS |> collect
f_posterior_bfgs_pred_before = f_posterior_bfgs_before(xpred_before)
f_posterior_bfgs_pred_after = f_posterior_bfgs_after(xpred_after)
ms_bfgs_pred_before = marginals(f_posterior_bfgs_pred_before);



using Plots
plotly()
p = plot()
plot!(p, f_posterior_bfgs_pred_before)
plot!(p, f_posterior_bfgs_pred_after)
p

The raw observations:

The GP:

Sometimes it raises an error

ERROR: LoadError: DomainError with -0.02933073033844247:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
 [1] throw_complex_domainerror(::Symbol, ::Float64) at /nix/store/lb43a78rgdbpfa198cymvxir92ns8hx4-julia-1.5.3/lib/julia/sys.so:?
 [2] sqrt at ./math.jl:573 [inlined]
 [3] posterior_and_lml(::TemporalGPs.Gaussian
...

or plots something weird instead:

willtebbutt · May 28, 2021, 1:33pm

Hmmm is it possible that the wrong data is being conditioned upon? Specifcally, it looks like time is being conditioned on in these lines:

f_posterior_bfgs_before = posterior(f_bfgs_before(df_before.t, θ_bfgs.s_noise), df_before.t);
f_posterior_bfgs_after = posterior(f_bfgs_after(df_after.t, θ_bfgs.s_noise), df_after.t);

Replacing it with y:

f_posterior_bfgs_before = posterior(f_bfgs_before(df_before.t, θ_bfgs.s_noise), df_before.y);
f_posterior_bfgs_after = posterior(f_bfgs_after(df_after.t, θ_bfgs.s_noise), df_after.y);

I get something like this:

jzr · May 28, 2021, 6:25pm

Ouch, you’re right, thanks for catching that.

jzr · May 28, 2021, 6:27pm

@willtebbutt On my end I have to run it several times to get

Often it gives some kind of weird fit

or error

ERROR: LoadError: DomainError with -3.944864583570347e147:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
 [1] throw_complex_domainerror(::Symbol, ::Float64) at /nix/store/lb43a78rgdbpfa198cymvxir92ns8hx4-julia-1.5.3/lib/julia/sys.so:?
 [2] sqrt at ./math.jl:573 [inlined]
 [3] posterior_and_lml(::TemporalGPs.Gaussian{StaticArrays.SArray{Tuple{2},Float64,1,2},StaticArrays.SArray{Tuple{2,2},Float64,2,4}}, ::TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64,StaticArrays.SArray{Tuple{2},Float64,1,2}},Float64,Float64}, ::Float64)

Do you get the same? How can I avoid those weird results? It looks like your plot is much more jagged than mine.

Topic		Replies	Views
Using TemporalGPs General Usage gaussian-process	1	372	June 21, 2021
Understanding Stheno example Probabilistic programming gaussian-process , stheno	1	619	April 16, 2021
Stheno marginals Probabilistic programming stheno	7	626	April 11, 2021
Implementing sparse Gaussian process model with Turing and Stheno Probabilistic programming question , turing	15	1918	December 23, 2020
Stheno.jl@0.7 Probabilistic programming	0	317	April 2, 2021

Running TemporalGPs/Stheno sampling

Related topics