DynamicHMC: multivariate normal loglikelihood?

I have a simple example working with a univariate normal loglikelihood, but I am having trouble extending it to multivariate normal. I am implementing simulated moments, with the asymptotic Gaussian distribution used to form the likelihood function. Below are two code examples, the first of which works (univariate case), and the second of which is a simple modification of the first, adding a second statistic to the problem. The second example does not run, giving the error

ERROR: LoadError: MethodError: no method matching loglikelihood(::MvNormal{Float64,PDMats.PDMat{Float64,Array{Float64,2}},Array{Float64,1}}, ::Array{Array{ForwardDiff.Dual{...

which makes me think that the AD is not able to handle the MvNormal loglikelihood, but perhaps I’m wrong about that. Here are the examples.
Any pointers would be very welcome!

This code, which just uses the sample mean to form the univariate statistic, is working;

# simple example of method of simulated moments estimated by
# NUTS MCMC

# load packages
using TransformVariables, LogDensityProblems, DynamicHMC, MCMCDiagnostics,
    Parameters, Statistics, Distributions, ForwardDiff


"""
DGP is``y ∼ μ + ϵ``, where ``ϵ ∼ N(0, 1)`` IID,
and the statistic is the sample mean.

Prior for μ is flat
"""
function dgp(μ)
    y = randn(100) .+ μ
    m = mean(y)
end

# generate data
μ_true = 3.0
m = dgp(μ_true)

# Define a structure for the problem
# Should hold the data and  the parameters of prior distributions.
struct MSM_Problem{Tm <: Real}
    "statistic"
    m::Tm
end

# Make the type callable with the parameters *as a single argument*.
function (problem::MSM_Problem)(θ)
    @unpack m = problem   # extract the data
    @unpack μ = θ         # extract parameters (only one here)
    S = 100
    ms = 0.0
    for s = 1:S
        ms += dgp(μ)/S
    end
    loglikelihood(Normal(0, 1), [m .- ms])
end

# original problem, without transformation of parameters
p = MSM_Problem(m)
# define the transformation of parameters (in this case, an identity)
problem_transformation(p::MSM_Problem) = as((μ=as(Array,1),))
# Wrap the problem with the transformation
t = problem_transformation(p)
P = TransformedLogDensity(t, p)
# use AD (Flux) for the gradient
∇P = ADgradient(:ForwardDiff, P);

# Sample from the posterior. `chain` holds the chain (positions and
# diagnostic information), while the second returned value is the tuned sampler
# which would allow continuation of sampling.
chain, NUTS_tuned = NUTS_init_tune_mcmc(∇P, 1000);

# We use the transformation to obtain the posterior from the chain.
posterior = transform.(Ref(t), get_position.(chain));

However, when I try to add a second statistic, I am not able to get the AD to work. The code:

# simple example of method of simulated moments estimated by
# NUTS MCMC

# load packages
using TransformVariables, LogDensityProblems, DynamicHMC, MCMCDiagnostics,
    Parameters, Statistics, Distributions, ForwardDiff


"""
DGP is``y ∼ μ + ϵ``, where ``ϵ ∼ N(0, 1)`` IID,
and the statistic is the sample mean and sample std. dev. (which is useless in this case)

Prior for μ is flat
"""
function dgp(μ)
    y = randn(100) .+ μ
    m = [mean(y);std(y)]
end

# generate data
μ_true = 3.0
m = dgp(μ_true)

# Define a structure for the problem
# Should hold the data and  the parameters of prior distributions.
struct MSM_Problem{Tm <: Vector}
    "statistic"
    m::Tm
end

# Make the type callable with the parameters *as a single argument*.
function (problem::MSM_Problem)(θ)
    @unpack m = problem   # extract the data
    @unpack μ = θ         # extract parameters (only one here)
    S = 100
    ms = zeros(2)
    for s = 1:S
        ms += dgp(μ)/S
    end
    loglikelihood(MvNormal(zeros(2), eye(2)), [m .- ms])
end

# original problem, without transformation of parameters
p = MSM_Problem(m)
# define the transformation of parameters (in this case, an identity)
problem_transformation(p::MSM_Problem) = as((μ=as(Array,1),))
# Wrap the problem with the transformation
t = problem_transformation(p)
P = TransformedLogDensity(t, p)
# use AD (Flux) for the gradient
∇P = ADgradient(:ForwardDiff, P);

# Sample from the posterior. `chain` holds the chain (positions and
# diagnostic information), while the second returned value is the tuned sampler
# which would allow continuation of sampling.
chain, NUTS_tuned = NUTS_init_tune_mcmc(∇P, 1000);


Sorry, I just realized that my second example is using a private function

function eye(m::Int64)
    Matrix(1.0I, m, m)
end    

(You also forgot using LinearAlgebra.)

I don’t think you are calling loglikelihood right. You can debug these things with eg

logdensity(LogDensityProblems.Value, P, zeros(dimension(t)))

and it has nothing to do with AD. If you make that work, you can try

logdensity(LogDensityProblems.ValueGradient, P, zeros(dimension(t)))

I should probably write a tutorial about debugging these things at some point.

Thanks, I wrote out the MVN likelihood myself (up to a constant), and it seems to be working fine now. The code is below.

If I run this repeatedly, occasionally (about 20% of the runs) it crashes with an error message about singularity, occurring at line 62 of hamiltonian.jl, which is

GaussianKE(Minv::AbstractMatrix) = GaussianKE(Minv, cholesky(inv(Minv)).L)

I’m wondering if it would be appropriate to add a check before attempting the inversion in that line. I will investigate this more and will file an issue, if appropriate. I am wondering how the chain is initialized when the prior is flat? Perhaps if I supplied a slightly informative prior I could avoid this singularity problem.

# simple example of method of simulated moments estimated by
# NUTS MCMC

# load packages
using TransformVariables, LogDensityProblems, DynamicHMC, MCMCDiagnostics,
    Parameters, Statistics, Distributions, ForwardDiff, LinearAlgebra


"""
DGP is``y ∼ μ + ϵ``, where ``ϵ ∼ N(0, 1)`` IID,
and the statistic is the sample mean and sample std. dev. (which is useless in this case)

Prior for μ is flat
"""
function dgp(μ)
    n = 100
    y = randn(n) .+ μ
    m = sqrt(n)*[mean(y);std(y)]
end

# generate data
μ_true = 3.0
m = dgp(μ_true)


# Define a structure for the problem
# Should hold the data and  the parameters of prior distributions.
struct MSM_Problem{Tm <: Vector}
    "statistic"
    m::Tm
end

# Make the type callable with the parameters *as a single argument*.
function (problem::MSM_Problem)(θ)
    @unpack m = problem   # extract the data
    @unpack μ = θ         # extract parameters (only one here)
    S = 100
    ms = dgp.(μ.*ones(S,1))
    mbar = mean(ms)
    Σ = cov(ms[:])
    x = (m .- mbar)
    logL = try
        logL = -0.5*log(det(Σ)) - 0.5*x'*inv(Σ)*x
    catch
        logL = -Inf
    end    
end

# original problem, without transformation of parameters
p = MSM_Problem(m)
# define the transformation of parameters (in this case, an identity)
problem_transformation(p::MSM_Problem) = as((μ=as(Array,1),))
# Wrap the problem with the transformation
t = problem_transformation(p)
P = TransformedLogDensity(t, p)
# use AD (Flux) for the gradient
∇P = ADgradient(:ForwardDiff, P);

# Sample from the posterior. `chain` holds the chain (positions and
# diagnostic information), while the second returned value is the tuned sampler
# which would allow continuation of sampling.
chain, NUTS_tuned = NUTS_init_tune_mcmc(∇P, 1000);

# We use the transformation to obtain the posterior from the chain.
posterior = transform.(Ref(t), get_position.(chain));


I have pushed a fix for the immediate issue:

but it may not help you as your log density is not continuous, eg

using PGFPlotsX
μs = range(0, 6; length = 100)
@pgf Plot({ no_marks }, Table(μs, map(μ -> logdensity(Real, P, [μ]), μs)))

and tuning is not very robust to that. I would try with a manually specified stepsize, eg around 0.2–0.5.

Thanks! I will try it out soon.

I think I can easily make the density continuous by using fixed random draws for the sampling from the dgp. That’s the typical solution to the “chatter” problem in simulation-based estimation.

Yes, if you can use common random variables, it should improve sampling drastically (please report back here, I am curious).

In the meantime, I released the version with the fix.

The next major release of DynamicHMC will use a more robust adaptation method with various heuristics, but that’s heavily WIP for now.

Here’s a version that uses common random numbers, and seems quite stable, much more so than before. It has only run into trouble once, in a number of runs. If you could provide examples of how to specify the initial value of the chain, and also how to fix the stepsize \epsilion, rather than using the tuning, that would be nice.

I will next try this out on a more realistic model, now that the basics seem to be working. Thanks for the help!

# simple example of method of simulated moments estimated by
# NUTS MCMC

# load packages
using TransformVariables, LogDensityProblems, DynamicHMC, MCMCDiagnostics,
    Parameters, Statistics, Distributions, ForwardDiff, LinearAlgebra


"""
DGP is``y ∼ μ + ϵ``, where ``ϵ ∼ N(0, 1)`` IID,
and the statistic is the sample mean and sample std. dev. (which is useless in this case)

Prior for μ is flat
"""
function dgp(μ, shocks)
    n = size(shocks,1)
    y = shocks .+ μ
    m = sqrt(n)*[mean(y);std(y)]
end

# generate data
μ_true = 3.0
n = 100 # sample size
S = 1000 # number of simulations
shocks = randn(n)
m = dgp(μ_true, shocks)
shocks = randn(n, S) # fixed shocks for simulations

# Define a structure for the problem
# Should hold the data and  the parameters of prior distributions.
struct MSM_Problem{Tm <: Vector, Tshocks <: Array}
    "statistic"
    m::Tm
    "shocks"
    shocks::Tshocks
end

# Make the type callable with the parameters *as a single argument*.
function (problem::MSM_Problem)(θ)
    @unpack m, shocks = problem   # extract the data
    @unpack μ = θ         # extract parameters (only one here)
    S = size(shocks,1)
    mbar = zeros(2,1)
    Σ = zeros(2,2)
    for s = 1:S
        mm = dgp(μ,shocks[:,s])
        mbar += mm/S
        Σ += mm*mm'/S
    end
    mbar = mbar[:]
    x = (m .- mbar)
    logL = try
        logL = -0.5*log(det(Σ)) - 0.5*x'*inv(Σ)*x
    catch
        logL = -Inf
    end    
end

# original problem, without transformation of parameters
p = MSM_Problem(m, shocks)
# define the transformation of parameters (in this case, an identity)
problem_transformation(p::MSM_Problem) = as((μ=as(Array,1),))
# Wrap the problem with the transformation
t = problem_transformation(p)
P = TransformedLogDensity(t, p)
# use AD (Flux) for the gradient
∇P = ADgradient(:ForwardDiff, P);

# Sample from the posterior. `chain` holds the chain (positions and
# diagnostic information), while the second returned value is the tuned sampler
# which would allow continuation of sampling.
chain, NUTS_tuned = NUTS_init_tune_mcmc(∇P, 1000);

# We use the transformation to obtain the posterior from the chain.
posterior = transform.(Ref(t), get_position.(chain));


1 Like

It could be as simple as

chain, NUTS_tuned = NUTS_init_tune_mcmc(∇P, 1000; ϵ = 0.2);

(note the keyword). This is of course adapted further, but may help avoid initial maladaptation.

I am wondering if the code is type stable though, as you initialize with

    mbar = zeros(2,1)
    Σ = zeros(2,2)

but maybe the compiler can shrug this off now :wink: Consider something like

    T = eltype(μ)
    mbar = zeros(T, 2, 1)
    Σ = zeros(T, 2, 2)

instead.

1 Like

A medium-sized example would be appreciated as a PR to

if you have the time.

Sure, I’ll be happy to do that.

2 Likes