Hi everyone,
My data (and data generating process) is probably quite peculiar, but nevertheless me (and my supervisor) are convinced it should be possible to model using probabilistic programming, and we believe Gaussian Processes make the most sense.
So in my data about dogs, we have ‘tests’ at certain time points, with some test scores at those time points.
At time point 1, we additionally have a behavioral score that sort of indicates the occurrence of certain problem behavior for a dog (based on a validated questionnaire).
The idea is to model this behavioral score over the timepoints, where it is only really observed at time point 1. Additionally, the test scores essentially depend on the behavioral score, because the behavioral score indicates to some extent how we expect the dog to behave (conditional on certain circumstances).
So we have some theory about how we expect the behavioral score to vary over time, which is also how we sort of heuristically pick our kernel parameters for the gaussian process (instead of learning them from the data like you normally would).
Now, the issue is, in order to model the behavioral score over time, we need to pass a mix of Float values and missing values to the GP, with the expectation that the missing values will be sampled from the kernel accordingly, but that is where it throws an error. I also tried the approach where I use an MvNormal instead of AbstractGP’s, but it throws a similar error. So I am not sure how I could do my modeling instead in order to achieve my goal, or whether this is a bug somewhere related to missing values.
Error for AbstractGP's implementation (when running the code below)
ERROR: LoadError: MethodError: no method matching loglikelihood(::AbstractGPs.FiniteGP{GP{AbstractGPs.ConstMean{Float64},ScaledKernel{TransformedKernel{SqExponentialKernel,ScaleTransform{Float64}},Float64}},Array{Int64,1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}, ::Array{Union{Missing, Float64},1})
Closest candidates are:
loglikelihood(::AbstractGPs.FiniteGP, ::AbstractArray{var"#s16",2} where var"#s16"<:Real) at /home/michiel/.julia/packages/AbstractGPs/TRhxZ/src/abstract_gp/finite_gp.jl:222
loglikelihood(::Distribution{Univariate,S} where S<:ValueSupport, ::AbstractArray) at /home/michiel/.julia/packages/Distributions/hAr7L/src/univariates.jl:522
loglikelihood(::Distribution{Multivariate,S} where S<:ValueSupport, ::AbstractArray{var"#s123",1} where var"#s123"<:Real) at /home/michiel/.julia/packages/Distributions/hAr7L/src/multivariates.jl:264
...
Stacktrace:
[1] observe(::DynamicPPL.SampleFromUniform, ::AbstractGPs.FiniteGP{GP{AbstractGPs.ConstMean{Float64},ScaledKernel{TransformedKernel{SqExponentialKernel,ScaleTransform{Float64}},Float64}},Array{Int64,1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}, ::Array{Union{Missing, Float64},1}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}) at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/context_implementations.jl:152
[2] _tilde(::DynamicPPL.SampleFromUniform, ::AbstractGPs.FiniteGP{GP{AbstractGPs.ConstMean{Float64},ScaledKernel{TransformedKernel{SqExponentialKernel,ScaleTransform{Float64}},Float64}},Array{Int64,1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}, ::Array{Union{Missing, Float64},1}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}) at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/context_implementations.jl:109
[3] tilde(::DynamicPPL.DefaultContext, ::DynamicPPL.SampleFromUniform, ::AbstractGPs.FiniteGP{GP{AbstractGPs.ConstMean{Float64},ScaledKernel{TransformedKernel{SqExponentialKernel,ScaleTransform{Float64}},Float64}},Array{Int64,1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}, ::Array{Union{Missing, Float64},1}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}) at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/context_implementations.jl:67
[4] tilde_observe(::DynamicPPL.DefaultContext, ::DynamicPPL.SampleFromUniform, ::AbstractGPs.FiniteGP{GP{AbstractGPs.ConstMean{Float64},ScaledKernel{TransformedKernel{SqExponentialKernel,ScaleTransform{Float64}},Float64}},Array{Int64,1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}, ::Array{Union{Missing, Float64},1}, ::DynamicPPL.VarName{:behavioral_score,Tuple{}}, ::Tuple{}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}) at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/context_implementations.jl:89
[5] #35 at /mnt/c/Users/michiel.verburg/Documents/repositories/master_project_gitlab/scripts/cbarq/small_reproducible_model_discourse_missing_values.jl:30 [inlined]
[6] (::var"#35#36")(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::DynamicPPL.SampleFromUniform, ::DynamicPPL.DefaultContext, ::Array{Union{Missing, Float64},1}, ::Array{Int64,1}, ::Array{Float64,1}, ::Float64) at ./none:0
[7] macro expansion at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/model.jl:0 [inlined]
[8] _evaluate(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::DynamicPPL.SampleFromUniform, ::DynamicPPL.DefaultContext) at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/model.jl:154
[9] evaluate_threadunsafe(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::DynamicPPL.SampleFromUniform, ::DynamicPPL.DefaultContext) at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/model.jl:127
[10] (::DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}})(::Random._GLOBAL_RNG, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::DynamicPPL.SampleFromUniform, ::DynamicPPL.DefaultContext) at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/model.jl:92
[11] DynamicPPL.VarInfo(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}}, ::DynamicPPL.SampleFromUniform, ::DynamicPPL.DefaultContext) at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/varinfo.jl:126
[12] DynamicPPL.VarInfo(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}}, ::DynamicPPL.SampleFromUniform) at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/varinfo.jl:125
[13] step(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}}, ::DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric}}; resume_from::Nothing, kwargs::Base.Iterators.Pairs{Symbol,Int64,Tuple{Symbol},NamedTuple{(:nadapts,),Tuple{Int64}}}) at /home/michiel/.julia/packages/DynamicPPL/u14IH/src/sampler.jl:73
[14] macro expansion at /home/michiel/.julia/packages/AbstractMCMC/Nw3Wn/src/sample.jl:78 [inlined]
[15] macro expansion at /home/michiel/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328 [inlined]
[16] (::AbstractMCMC.var"#20#21"{Bool,String,Nothing,Int64,Int64,Base.Iterators.Pairs{Symbol,Int64,Tuple{Symbol},NamedTuple{(:nadapts,),Tuple{Int64}}},Random._GLOBAL_RNG,DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric}},Int64,Int64})() at /home/michiel/.julia/packages/AbstractMCMC/Nw3Wn/src/logging.jl:11
[17] with_logstate(::Function, ::Any) at ./logging.jl:408
[18] with_logger(::Function, ::LoggingExtras.TeeLogger{Tuple{LoggingExtras.EarlyFilteredLogger{TerminalLoggers.TerminalLogger,AbstractMCMC.var"#1#3"{Module}},LoggingExtras.EarlyFilteredLogger{Logging.ConsoleLogger,AbstractMCMC.var"#2#4"{Module}}}}) at ./logging.jl:514
[19] with_progresslogger(::Function, ::Module, ::Logging.ConsoleLogger) at /home/michiel/.julia/packages/AbstractMCMC/Nw3Wn/src/logging.jl:34
[20] macro expansion at /home/michiel/.julia/packages/AbstractMCMC/Nw3Wn/src/logging.jl:10 [inlined]
[21] mcmcsample(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}}, ::DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric}}, ::Int64; progress::Bool, progressname::String, callback::Nothing, discard_initial::Int64, thinning::Int64, chain_type::Type{T} where T, kwargs::Base.Iterators.Pairs{Symbol,Int64,Tuple{Symbol},NamedTuple{(:nadapts,),Tuple{Int64}}}) at /home/michiel/.julia/packages/AbstractMCMC/Nw3Wn/src/sample.jl:76
[22] sample(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}}, ::DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric}}, ::Int64; chain_type::Type{T} where T, resume_from::Nothing, progress::Bool, nadapts::Int64, discard_adapt::Bool, discard_initial::Int64, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/michiel/.julia/packages/Turing/gQLRw/src/inference/hmc.jl:140
[23] sample at /home/michiel/.julia/packages/Turing/gQLRw/src/inference/hmc.jl:123 [inlined]
[24] #sample#2 at /home/michiel/.julia/packages/Turing/gQLRw/src/inference/Inference.jl:142 [inlined]
[25] sample at /home/michiel/.julia/packages/Turing/gQLRw/src/inference/Inference.jl:142 [inlined]
[26] #sample#1 at /home/michiel/.julia/packages/Turing/gQLRw/src/inference/Inference.jl:132 [inlined]
[27] sample(::DynamicPPL.Model{var"#35#36",(:behavioral_score, :tp, :test_score, :jitter),(:jitter,),(),Tuple{Array{Union{Missing, Float64},1},Array{Int64,1},Array{Float64,1},Float64},Tuple{Float64}}, ::NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric}, ::Int64) at /home/michiel/.julia/packages/Turing/gQLRw/src/inference/Inference.jl:132
[28] top-level scope at /mnt/c/Users/michiel.verburg/Documents/repositories/master_project_gitlab/scripts/cbarq/small_reproducible_model_discourse_missing_values.jl:38
[29] include(::String) at ./client.jl:457
[30] top-level scope at REPL[9]:1
in expression starting at /mnt/c/Users/michiel.verburg/Documents/repositories/master_project_gitlab/scripts/cbarq/small_reproducible_model_discourse_missing_values.jl:38
Error for MvNormal implementation
ERROR: LoadError: MethodError: no method matching loglikelihood(::MvNormal{Float64,PDMats.PDMat{Float64,Array{Float64,2}},Array{Float64,1}}, ::Array{Union{Missing, Float64},1})
using Turing
using KernelFunctions
using AbstractGPs
using StatsFuns
# n_days = 50
tp = [1, 15, 42]
# the true scores, in reality we only know the first one.
behavioral_score = [3.0, 2.5, 1.9]
# let's say the predictor is exactly linearly related with the outcome score.
# Divide by 4 because behavioral scores are in the range of 0 to 4, and test scores are in the range of 0 to 1.
test_score = behavioral_score/4
# set all scores to missing that aren't the first timepoint.
behavioral_score_missing = Array{Union{Missing,Float64}}(behavioral_score)
behavioral_score_missing[2:length(behavioral_score_missing)] .= missing
sekernel(alpha, rho) = alpha^2 * KernelFunctions.transform(SEKernel(), 1/(rho*sqrt(2)))
@model function score_model(behavioral_score, tp, test_score, jitter=1e-6 )
sigma ~ LogNormal(0.0, 1.0)
f = GP(2.0, sekernel(2.0, 20.0))
behavioral_score ~ f(tp, sigma^2 + jitter)
# the mean is the same as the behavioral_score/4, since we expect the behavioral
# score to linearly predict the score on the test.
test_score .~ Normal.(behavioral_score/4, 0.1)
end
model = score_model(behavioral_score_missing, tp, test_score)
chain = sample(model, NUTS(0.65), 100)
Note:
At first I thought maybe it is not a bug, but that gaussian processes are not meant to be able to do this, and hence I also considered that maybe I should just do the predictions of the behavioral score on timepoints where it is missing after obtaining the chain. However, aside from the fact that I run into some troubles to do that practically, I also realized that doing this will make me lose some information. Because in theory it is possible that 2 tests are really close together in time (e.g. timepoint 1 and 2), and thus the behavioral score, despite not being known, can be estimated with a high probability, and thus it would contribute to the process of finding relationships between behavioral scores and test scores.
Thanks in advance!