Inference using HMC on a Hidden Markov Model

I am taking my first steps in Turing (and Julia) trying to solve a model inversion problem in a Hidden Markov Model. The model I have so far works fine with SMC(), IS() and PG(). For HMC() and NUTS() I still have an issue I wasn’t able to further debug.

I created a (admittedly quite contrived) minimal working example that results in a TypeError:

TypeError: in typeassert, expected Float64, got ForwardDiff.Dual{Nothing,Float64,10}

This is the minimal working example:

using Distributions
using LinearAlgebra
using Turing

# Linear Model
Δt = 0.1
τ = 1.0
A = [1.0 Δt-(Δt^2/(2τ)); 0.0 1.0-(Δt/τ)]
Q = [0.25 * Δt^4 0.5 * Δt^3; 0.5 * Δt^3 Δt^2] + 1e-6 * I

B = [0.0 Δt^2/(2τ); 0.0 Δt/τ]
u = [0.0; 0.5]

H = [1.0 0.0; 0.0 1.0;]
σₓ, σᵥ = 0.1, 0.1
R = [σₓ 0.0; 0.0 σᵥ]

@model function hmm(states, ::Type{T} = Array{Float64}) where {T}
    # Initialize arrays for the time-series
    if states === missing
        states = T(undef, 2, 100)
    end
    observations = T(undef, 2, 100)

    # Prior for first step
    states[:,1] ~ MvNormal([0.0, 0.5], 0.001 * I)
    # THIS IS WHERE I GET THE ERROR
    observations[:,1] ~ MvNormal(states[:,1], R/Δt)

    for t in 2:size(states, 2)
        states[:,t] ~ MvNormal(A * states[:,t-1] + B * u, Δt*Q)
        observations[:,t] ~ MvNormal(states[:,t], R/Δt)
    end

    return states
end

# Sample a trial from the prior to be used as the ground truth
hmm_prior = hmm(missing)
states = hmm_prior()

# Condition on the sampled trial 
hmm_conditioned = hmm(states)

# Perform inference
chain = sample(hmm_conditioned, HMC(0.1, 5), 1000)

Can somebody tell, what I am missing here?

Thanks!
Simon

This is related to an issue with Distributions.jl.

The following should solve your issue.

using DistributionsAD: TuringMvNormal

@model function hmm(states, ::Type{T} = Array{Float64}) where {T}
    # Initialize arrays for the time-series
    observations = T(undef, 2, 100)

    # Prior for first step
    states[:,1] ~ MvNormal([0.0, 0.5], 0.001 * I)
    # THIS IS WHERE I GET THE ERROR
    observations[:,1] ~ TuringMvNormal(states[:,1], R/Δt)

    for t in 2:size(states, 2)
        states[:,t] ~ TuringMvNormal(A * states[:,t-1] + B * u, Δt*Q)
        observations[:,t] ~ TuringMvNormal(states[:,t], R/Δt)
    end

    return states
end

The reason for this is the MvNormal from Distributions.jl assumes all parameters to have the same type, and tries to cast the parameters to a common type otherwise. This will not work if one of the parameters if fixed and the other one is inferred using a gradient-based inference algorithm as the type changes during gradient computation in this case.

I believe you also don’t need that additional check you added. You should be able to simply call:

states = hmm(fill(missing, 2, 100))()

IS, SMC and PG do not require the computation of gradients, which is why you did not observe this issue before. Also not that you might want to use :reversediff as the AD backend instead.

Actually, this looks like a bug in handling the type parameter. Please open an issue. Meanwhile, try Array{Float64, 2} instead of Array{Float64}.

@mohamed82008 appears to be right, HMC() and NUTS() work when changing the type parameter to:

@model function hmm(states, ::Type{T} = Array{Float64, 2}) where {T}
...

I will open an issue on GitHub in this regard.

This would be nice, but unfortunately it does not work for me. It returns the following error:

LoadError: MethodError: Cannot `convert` an object of type Missing to an object of type Float64

I took this additional check from a code snippet in the official guide.

1 Like

The logic for how Turing deals with missing values (or rather with ~) is very simple, but it seems we’re not doing a good job with explaining it since these questions and errors come up very regularly. So the two basic rules are: ~ indicates an assumption (i.e., the variable on the LHS is sampled) if

  • the variable on the LHS of ~ is not an input argument to the model, or
  • the LHS evaluates to missing.

So if you pass fill(missing, 2, 100) as states to the model, then the LHS of expressions such as states[:, i] ~ ... evaluates to a vector (of missing) but not just missing, and hence the model does consider it as an observation. However, evaluating the log probability for the “observation” on the LHS fails with the error you got.

So the correct/natural way to implement this model would be

@model function hmm(states, ::Type{T} = Float64) where T
    # Initialize arrays for the time-series
    observations = Matrix{T}(undef, 2, length(states))

    # Prior for first step
    states[1] ~ MvNormal([0.0, 0.5], sqrt(0.001))
    observations[:, 1] ~ MvNormal(states[1], R/Δt)

    for t in 2:length(states)
        states[t] ~ MvNormal(A * states[t-1] + B * u, Δt*Q)
        observations[:, t] ~ MvNormal(states[t], R/Δt)
    end

    return states
end

i.e., states are vectors of vectors instead of matrices (BTW I also assume that in general this version should be faster since it avoids the allocations of all the vectors that you create every time you select a column - since observations are just sampled it is probably more efficient to keep them as matrices). You can then use states such as

states = [rand(2) for _ in 1:100]

or

states = convert(Vector{Union{Missing,Vector{Float64}}}, fill(missing, 100))

or

states = Union{Missing,Vector{Float64}}[missing for _ in 1:100]

(we have to be able to save the sampled state in states, and hence fill(missing, 100) won’t work because it creates a vector of type Vector{Missing}). I suggest that you convert them to a concretely-typed vector after sampling from the prior though. Also precomputing all constant terms might improve efficiency (for 2x2 matrices it shouldn’t matter much but I guess it will be useful in higher dimensions).

2 Likes

Thank you David for taking the time, that was very helpful!

Sorry for warming up this thread again, but I was once again trying your suggestion to simplify handling missing values and I realised that I receive the following error message for PG() and SMC() (not for the other samplers though) that I am unsure how to debug:

LoadError: Libtask.CTaskException(Task (failed) @0x00007f77bb145120)

This is the full code to reproduce it:

using LinearAlgebra
using Turing

# Linear Model
Δt = 0.1
A = [1.0 Δt; 0.0 1.0]
Q = [0.25 * Δt^4 0.5 * Δt^3; 0.5 * Δt^3 Δt^2] + 1e-6 * I

H = [1.0 0.0; 0.0 1.0;]
R = 0.1 * I

@model function hmm(states, ::Type{T} = Float64) where T
    # Initialize arrays for the time-series
    observations = Matrix{T}(undef, 2, length(states))

    # Prior for first step
    states[1] ~ MvNormal([0.0, 0.5], sqrt(0.001))
    observations[:, 1] ~ MvNormal(states[1], R/Δt)

    for t in 2:length(states)
        states[t] ~ MvNormal(A * states[t-1], Δt*Q)
        observations[:, t] ~ MvNormal(states[t], R/Δt)
    end

    return states
end

# Sample a trial from the prior to be used as the ground truth
hmm_prior = hmm(Union{Missing,Vector{Float64}}[missing for _ in 1:100])
states = hmm_prior()

# Condition on the sampled trial 
hmm_conditioned = hmm(states)

# Perform inference (working)
# chain = sample(hmm_conditioned, IS(), 1000)
# chain = sample(hmm_conditioned, HMC(0.1, 5), 1000)
# chain = sample(hmm_conditioned, NUTS(0.65), 1000)

# Perform inference (not working)
chain = sample(hmm_conditioned, SMC(), 1000)
# chain = sample(hmm_conditioned, PG(10), 1000)


Could someone look into this? I still receive the following error when using SMC() or PG() but not for the other samplers:

ERROR: LoadError: CTaskException:
Malformed dims

Unrelated question, is there a reason you like to use PG instead of HMC or NUTS?

1 Like

Btw. I just had a look at your code, and it seems that SMC and PG don’t like the Array{Union{Missing, Array{Float64,1}},1} you condition on. The solution to this is to simply use the following instead.

states = Array{Array{Float64}}(hmm_prior())

This is likely a bug in LibTask. I’ll open an issue for you.

Thanks a lot, that did the trick!

Unrelated question, is there a reason you like to use PG instead of HMC or NUTS?

I would prefer to use HMC/NUTS but in my full model (I only posted a simplified version), I have components that are AFAIK non-differentiable and only work with PG.

1 Like