Differences in using MvNormal and Normal in the calculation of the likelihood

Hello all,

I have a question related the ODE Turing tutorial:

using Turing, DifferentialEquations, StatsPlots, LinearAlgebra, Random

Random.seed!(14);

function lotka_volterra(du, u, p, t)
    # Model parameters.
    α, β, γ, δ = p
    # Current state.
    x, y = u

    # Evaluate differential equations.
    du[1] = (α - β * y) * x # prey
    du[2] = (δ * x - γ) * y # predator

    return nothing
end

# Define initial-value problem.
u0 = [1.0, 1.0]
p = [1.5, 1.0, 3.0, 1.0]
tspan = (0.0, 10.0)
prob = ODEProblem(lotka_volterra, u0, tspan, p)

sol = solve(prob, Tsit5(); saveat=0.1)
odedata = Array(sol) + 0.8 * randn(size(Array(sol)))

@model function fitlv(data, prob)
    # Prior distributions.
    σ ~ InverseGamma(2, 3)
    α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
    β ~ truncated(Normal(1.2, 0.5); lower=0,   upper=2)
    γ ~ truncated(Normal(3.0, 0.5); lower=1,   upper=4)
    δ ~ truncated(Normal(1.0, 0.5); lower=0,   upper=2)

    # Simulate Lotka-Volterra model. 
    p = [α, β, γ, δ]
    predicted = solve(prob, Tsit5(); p=p, saveat=0.1)

    # Observations.
    for i in 1:length(predicted)
        data[:, i] ~ MvNormal(predicted[i], σ^2 * I)
    end
    return nothing
end

model = fitlv(odedata, prob)

# Sample 3 independent chains with forward-mode automatic differentiation (the default).
@time chain = sample(model, NUTS(), MCMCSerial(), 500, 3; progress=true)

Now imagine that we do not have the same amount of observations for preys and predators in odedata. We will have missing values and I have read it is quite problematic when calculating the likelihood.

Is it possible to just concatenate the predators and prey and use Normal instead of MvNormal as follows?

Thanks

@model function fitlv_test(data, prob)
    # Prior distributions.
    σ ~ InverseGamma(2, 3)
    α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
    β ~ truncated(Normal(1.2, 0.5); lower=0,   upper=2)
    γ ~ truncated(Normal(3.0, 0.5); lower=1,   upper=4)
    δ ~ truncated(Normal(1.0, 0.5); lower=0,   upper=2)

    # Simulate Lotka-Volterra model. 
    p = [α, β, γ, δ]
    predicted = solve(prob, Tsit5(); p=p, saveat=0.1)
    pred = vcat(predicted[1,:],predicted[2,:]);

    # Observations.
    for i in 1:length(pred)
        data[i] ~ Normal(pred[i], σ^2)
    end
    return nothing
end

data_test = vcat(odedata[1,:],odedata[2,:])
model_test = fitlv_test(data_test, prob)

@time chain_test = sample(model_test, NUTS(), MCMCSerial(), 500, 3; progress=true)

Yes, this is equivalent in terms of probability:

julia> using Distributions

julia> logpdf(MvNormal([1 0; 0 1]), [1.0, -2.0])
-4.337877066409345

julia> logpdf(Normal(), 1.0) + logpdf(Normal(), -2)
-4.337877066409345

Depending on how many missing values you have for each species and how they’re arranged, it may make more sense to keep the data and ODE solutions as arrays, and do something like this inside your model:

for i in findall(!ismissing, data)
    data[i] ~ Normal(pred[i], σ)
end

There may be a more efficient vectorized way if you’re using reverse-mode AD, but for five parameters it shouldn’t be an issue and this is a clear way to express what the model is doing.

Amazing, thanks a lot @ElOceanografo.

Another related question: imagine you want different σ for your two variables, how Turing is estimating the “global” likelihood of your problem if you have separate distributions like as follows?
Thanks a lot.

for i in findall(!ismissing, data)
    data_prey[i] ~ Normal(pred_prey[i], σ_prey^2)
    data_predators[i] ~ Normal(pred_predators[i], σ_predators^2)
end

The global likelihood is just the product of all the individual conditional likelihoods in the model (though it’s actually calculated as a sum of log-likelihoods). That’s what Turing is doing inside the @model macro: each tilde statement value ~ SomeDistribution basically gets translated to logpdf(SomeDistribution, value), and they’re all added up to get the total log-likelihood (or, if we’re including the priors, the unnormalized log-posterior).

If you have different observation errors for the predator and prey, it just means the model will weigh the goodness-of-fit to the predator and prey time series differently. If those observation errors are themselves parameters in the model, it will try to fit them as well–the log-likelihood will be highest when they are correct, which is easy to check for yourself:

julia> using Distributions, Random

julia> Random.seed!(1)

julia> x = rand(Normal(0.0, 1.0), 100);

julia> loglikelihood(Normal(0.0, 0.5), x) # too low
-216.49103411053034

julia> loglikelihood(Normal(0.0, 1.0), x) # just right
-140.37182803198166

julia> loglikelihood(Normal(0.0, 1.5), x) # too high
-153.98613066973462

Finally, I just noticed you’ve been writing Normal(pred[i], σ^2), which should be Normal(pred[i], σ), since Julia’s normal distribution is parameterized in terms of the standard deviation, not the variance. The model will still run that way, but you may get confused interpreting the results.

1 Like

@ElOceanografo, thanks a lot for these useful remarks.

regarding the following comment:

Yes it was a mistake, I have been confused because when using the MvNormal distribution, it is setup using variance (for the covariance matrix) while when using a Normal distribution it set up using the σ.