# Inference using HMC on a Hidden Markov Model

I am taking my first steps in Turing (and Julia) trying to solve a model inversion problem in a Hidden Markov Model. The model I have so far works fine with `SMC()`, `IS()` and `PG()`. For `HMC()` and `NUTS()` I still have an issue I wasn’t able to further debug.

I created a (admittedly quite contrived) minimal working example that results in a `TypeError`:

``````TypeError: in typeassert, expected Float64, got ForwardDiff.Dual{Nothing,Float64,10}
``````

This is the minimal working example:

``````using Distributions
using LinearAlgebra
using Turing

# Linear Model
Δt = 0.1
τ = 1.0
A = [1.0 Δt-(Δt^2/(2τ)); 0.0 1.0-(Δt/τ)]
Q = [0.25 * Δt^4 0.5 * Δt^3; 0.5 * Δt^3 Δt^2] + 1e-6 * I

B = [0.0 Δt^2/(2τ); 0.0 Δt/τ]
u = [0.0; 0.5]

H = [1.0 0.0; 0.0 1.0;]
σₓ, σᵥ = 0.1, 0.1
R = [σₓ 0.0; 0.0 σᵥ]

@model function hmm(states, ::Type{T} = Array{Float64}) where {T}
# Initialize arrays for the time-series
if states === missing
states = T(undef, 2, 100)
end
observations = T(undef, 2, 100)

# Prior for first step
states[:,1] ~ MvNormal([0.0, 0.5], 0.001 * I)
# THIS IS WHERE I GET THE ERROR
observations[:,1] ~ MvNormal(states[:,1], R/Δt)

for t in 2:size(states, 2)
states[:,t] ~ MvNormal(A * states[:,t-1] + B * u, Δt*Q)
observations[:,t] ~ MvNormal(states[:,t], R/Δt)
end

return states
end

# Sample a trial from the prior to be used as the ground truth
hmm_prior = hmm(missing)
states = hmm_prior()

# Condition on the sampled trial
hmm_conditioned = hmm(states)

# Perform inference
chain = sample(hmm_conditioned, HMC(0.1, 5), 1000)
``````

Can somebody tell, what I am missing here?

Thanks!
Simon

This is related to an issue with `Distributions.jl`.

The following should solve your issue.

``````using DistributionsAD: TuringMvNormal

@model function hmm(states, ::Type{T} = Array{Float64}) where {T}
# Initialize arrays for the time-series
observations = T(undef, 2, 100)

# Prior for first step
states[:,1] ~ MvNormal([0.0, 0.5], 0.001 * I)
# THIS IS WHERE I GET THE ERROR
observations[:,1] ~ TuringMvNormal(states[:,1], R/Δt)

for t in 2:size(states, 2)
states[:,t] ~ TuringMvNormal(A * states[:,t-1] + B * u, Δt*Q)
observations[:,t] ~ TuringMvNormal(states[:,t], R/Δt)
end

return states
end
``````

The reason for this is the `MvNormal` from `Distributions.jl` assumes all parameters to have the same type, and tries to cast the parameters to a common type otherwise. This will not work if one of the parameters if fixed and the other one is inferred using a gradient-based inference algorithm as the type changes during gradient computation in this case.

I believe you also don’t need that additional check you added. You should be able to simply call:

``````states = hmm(fill(missing, 2, 100))()
``````

IS, SMC and PG do not require the computation of gradients, which is why you did not observe this issue before. Also not that you might want to use `:reversediff` as the AD backend instead.

Actually, this looks like a bug in handling the type parameter. Please open an issue. Meanwhile, try `Array{Float64, 2}` instead of `Array{Float64}`.

@mohamed82008 appears to be right, `HMC()` and `NUTS()` work when changing the type parameter to:

``````@model function hmm(states, ::Type{T} = Array{Float64, 2}) where {T}
...
``````

I will open an issue on GitHub in this regard.

This would be nice, but unfortunately it does not work for me. It returns the following error:

``````LoadError: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
``````

I took this additional check from a code snippet in the official guide.

1 Like

The logic for how Turing deals with missing values (or rather with `~`) is very simple, but it seems we’re not doing a good job with explaining it since these questions and errors come up very regularly. So the two basic rules are: `~` indicates an assumption (i.e., the variable on the LHS is sampled) if

• the variable on the LHS of `~` is not an input argument to the model, or
• the LHS evaluates to `missing`.

So if you pass `fill(missing, 2, 100)` as `states` to the model, then the LHS of expressions such as `states[:, i] ~ ...` evaluates to a vector (of `missing`) but not just `missing`, and hence the model does consider it as an observation. However, evaluating the log probability for the “observation” on the LHS fails with the error you got.

So the correct/natural way to implement this model would be

``````@model function hmm(states, ::Type{T} = Float64) where T
# Initialize arrays for the time-series
observations = Matrix{T}(undef, 2, length(states))

# Prior for first step
states[1] ~ MvNormal([0.0, 0.5], sqrt(0.001))
observations[:, 1] ~ MvNormal(states[1], R/Δt)

for t in 2:length(states)
states[t] ~ MvNormal(A * states[t-1] + B * u, Δt*Q)
observations[:, t] ~ MvNormal(states[t], R/Δt)
end

return states
end
``````

i.e., `states` are vectors of vectors instead of matrices (BTW I also assume that in general this version should be faster since it avoids the allocations of all the vectors that you create every time you select a column - since `observations` are just sampled it is probably more efficient to keep them as matrices). You can then use states such as

``````states = [rand(2) for _ in 1:100]
``````

or

``````states = convert(Vector{Union{Missing,Vector{Float64}}}, fill(missing, 100))
``````

or

``````states = Union{Missing,Vector{Float64}}[missing for _ in 1:100]
``````

(we have to be able to save the sampled state in `states`, and hence `fill(missing, 100)` won’t work because it creates a vector of type `Vector{Missing}`). I suggest that you convert them to a concretely-typed vector after sampling from the prior though. Also precomputing all constant terms might improve efficiency (for 2x2 matrices it shouldn’t matter much but I guess it will be useful in higher dimensions).

2 Likes

Thank you David for taking the time, that was very helpful!

Sorry for warming up this thread again, but I was once again trying your suggestion to simplify handling missing values and I realised that I receive the following error message for `PG()` and `SMC()` (not for the other samplers though) that I am unsure how to debug:

``````LoadError: Libtask.CTaskException(Task (failed) @0x00007f77bb145120)
``````

This is the full code to reproduce it:

``````using LinearAlgebra
using Turing

# Linear Model
Δt = 0.1
A = [1.0 Δt; 0.0 1.0]
Q = [0.25 * Δt^4 0.5 * Δt^3; 0.5 * Δt^3 Δt^2] + 1e-6 * I

H = [1.0 0.0; 0.0 1.0;]
R = 0.1 * I

@model function hmm(states, ::Type{T} = Float64) where T
# Initialize arrays for the time-series
observations = Matrix{T}(undef, 2, length(states))

# Prior for first step
states[1] ~ MvNormal([0.0, 0.5], sqrt(0.001))
observations[:, 1] ~ MvNormal(states[1], R/Δt)

for t in 2:length(states)
states[t] ~ MvNormal(A * states[t-1], Δt*Q)
observations[:, t] ~ MvNormal(states[t], R/Δt)
end

return states
end

# Sample a trial from the prior to be used as the ground truth
hmm_prior = hmm(Union{Missing,Vector{Float64}}[missing for _ in 1:100])
states = hmm_prior()

# Condition on the sampled trial
hmm_conditioned = hmm(states)

# Perform inference (working)
# chain = sample(hmm_conditioned, IS(), 1000)
# chain = sample(hmm_conditioned, HMC(0.1, 5), 1000)
# chain = sample(hmm_conditioned, NUTS(0.65), 1000)

# Perform inference (not working)
chain = sample(hmm_conditioned, SMC(), 1000)
# chain = sample(hmm_conditioned, PG(10), 1000)

``````

Could someone look into this? I still receive the following error when using `SMC()` or `PG()` but not for the other samplers:

``````ERROR: LoadError: CTaskException:
Malformed dims
``````

Unrelated question, is there a reason you like to use PG instead of HMC or NUTS?

1 Like

Btw. I just had a look at your code, and it seems that SMC and PG don’t like the `Array{Union{Missing, Array{Float64,1}},1}` you condition on. The solution to this is to simply use the following instead.

``````states = Array{Array{Float64}}(hmm_prior())
``````

This is likely a bug in LibTask. I’ll open an issue for you.

Thanks a lot, that did the trick!

Unrelated question, is there a reason you like to use PG instead of HMC or NUTS?

I would prefer to use HMC/NUTS but in my full model (I only posted a simplified version), I have components that are AFAIK non-differentiable and only work with PG.

1 Like