DomainError in TemporalGPs

I’m trying this on two different variables.

Below is a time-series scatterplot with loess smoother (in red) and a histogram for each variable. These images capture a 10k-row sample of a 2M-row dataset. In the real dataset that I’m fitting, there are about 400-800 rows for each time t.

Fairly Gaussian variable

function kern(params)
    matern = params.s_kernel^2 * (Matern32Kernel() ∘ ScaleTransform(params.λ))
    matern + ConstantKernel()
end

function build_model(params)
    gp = GP(kern(params))
    to_sde(gp, SArrayStorage(Float64))
end
nlml = (θ_flat) -> let
        θ = unpack(θ_flat)
        f = build_model(θ)
        nlp = -logpdf(f(df.t, θ.s_noise + 1e-6), df.xresid)
        @show(θ, nlp)
        nlp
end
θ₀ = (s_kernel=positive(.05), λ=positive(1/40), s_noise=positive(2.))
θ_flat_init, unflatten = flatten(θ₀);
unpack(θ_flat::Vector{Float64}) = value(unflatten(θ_flat))


@show(nlml(θ_flat_init)) # test init
results = Optim.optimize(nlml, θ->gradient(nlml, θ)[1], θ_flat_init, BFGS(), Optim.Options(show_trace=false,); inplace=false,)
θ_bfgs = unpack(results.minimizer)

fails on the initial value with DomainError with -159340.87622738612.

Quite non-Gaussian variable

This GP fits fine on small periods of time, e.g. 40 values of t, but not on longer periods like 300 or 3000 ts.

As you’ll notice, the data don’t look Gaussian. In settings like linear regression, I don’t usually consider Gaussian errors to be very important to successful estimation, but maybe it’s more important for GPs? I tried a transformation xs->demean(3.0 .^ (xs.+3)) to de-skew it, but it didn’t help:

θ₀ = (s_kernel=positive(.5), λ=positive(1/1000), s_noise=positive(12.))

gives LoadError: DomainError with -16.701296663676633.

Original:

Transformed:

1 Like