I wasn’t working on this problem for last couple days, but returned to it today and I don’t think that there is actually an issue in the way the model is defined. Instead, I think there is some problem in the gradient calculation.
If I change the lengthscale slightly the loglikelihood changes as well as shown below, so I think the gradient should exist. (Correct me if I’m wrong on this.)
Loglikelihood Computation
julia> function f(lengthscale)
noise = 0.1
gp = construct_finite_gp(Xe, first(lengthscale), noise)
return logpdf(gp, ye)
end
f (generic function with 1 method)
julia> f(0.9)
-158.09618544618976
julia> f(1.0)
-178.65913509460518
julia> f(1.1)
-205.45899882684503
Furthermore, with Zygote the gradient calculation actually works:
Gradient Calculation
julia> ForwardDiff.gradient(f, [1.])
1-element Vector{Float64}:
NaN
julia> ReverseDiff.gradient(f, [1.])
1-element Vector{Float64}:
NaN
julia> Zygote.gradient(f, [1.])
([-235.96836642578603],)
I don’t know why some AD backends work and some don’t. It may be some numerical issues or a bug in the backend packages?
Anyway, for completness;
The solution is to set the AD backend to Zygote by adding the following line:
Turing.setadbackend(:zygote)
I’ve tested the code and it runs fine with Zygote even with duplicate data points.