Inference using HMC on a Hidden Markov Model

smonsays · June 4, 2020, 11:08am

I am taking my first steps in Turing (and Julia) trying to solve a model inversion problem in a Hidden Markov Model. The model I have so far works fine with SMC(), IS() and PG(). For HMC() and NUTS() I still have an issue I wasn’t able to further debug.

I created a (admittedly quite contrived) minimal working example that results in a TypeError:

TypeError: in typeassert, expected Float64, got ForwardDiff.Dual{Nothing,Float64,10}

This is the minimal working example:

using Distributions
using LinearAlgebra
using Turing

# Linear Model
Δt = 0.1
τ = 1.0
A = [1.0 Δt-(Δt^2/(2τ)); 0.0 1.0-(Δt/τ)]
Q = [0.25 * Δt^4 0.5 * Δt^3; 0.5 * Δt^3 Δt^2] + 1e-6 * I

B = [0.0 Δt^2/(2τ); 0.0 Δt/τ]
u = [0.0; 0.5]

H = [1.0 0.0; 0.0 1.0;]
σₓ, σᵥ = 0.1, 0.1
R = [σₓ 0.0; 0.0 σᵥ]

@model function hmm(states, ::Type{T} = Array{Float64}) where {T}
    # Initialize arrays for the time-series
    if states === missing
        states = T(undef, 2, 100)
    end
    observations = T(undef, 2, 100)

    # Prior for first step
    states[:,1] ~ MvNormal([0.0, 0.5], 0.001 * I)
    # THIS IS WHERE I GET THE ERROR
    observations[:,1] ~ MvNormal(states[:,1], R/Δt)

    for t in 2:size(states, 2)
        states[:,t] ~ MvNormal(A * states[:,t-1] + B * u, Δt*Q)
        observations[:,t] ~ MvNormal(states[:,t], R/Δt)
    end

    return states
end

# Sample a trial from the prior to be used as the ground truth
hmm_prior = hmm(missing)
states = hmm_prior()

# Condition on the sampled trial 
hmm_conditioned = hmm(states)

# Perform inference
chain = sample(hmm_conditioned, HMC(0.1, 5), 1000)

Can somebody tell, what I am missing here?

Thanks!
Simon

trappmartin · June 4, 2020, 11:26am

This is related to an issue with Distributions.jl.

The following should solve your issue.

using DistributionsAD: TuringMvNormal

@model function hmm(states, ::Type{T} = Array{Float64}) where {T}
    # Initialize arrays for the time-series
    observations = T(undef, 2, 100)

    # Prior for first step
    states[:,1] ~ MvNormal([0.0, 0.5], 0.001 * I)
    # THIS IS WHERE I GET THE ERROR
    observations[:,1] ~ TuringMvNormal(states[:,1], R/Δt)

    for t in 2:size(states, 2)
        states[:,t] ~ TuringMvNormal(A * states[:,t-1] + B * u, Δt*Q)
        observations[:,t] ~ TuringMvNormal(states[:,t], R/Δt)
    end

    return states
end

The reason for this is the MvNormal from Distributions.jl assumes all parameters to have the same type, and tries to cast the parameters to a common type otherwise. This will not work if one of the parameters if fixed and the other one is inferred using a gradient-based inference algorithm as the type changes during gradient computation in this case.

I believe you also don’t need that additional check you added. You should be able to simply call:

states = hmm(fill(missing, 2, 100))()

trappmartin · June 4, 2020, 11:28am

IS, SMC and PG do not require the computation of gradients, which is why you did not observe this issue before. Also not that you might want to use :reversediff as the AD backend instead.

mohamed82008 · June 4, 2020, 11:32am

Actually, this looks like a bug in handling the type parameter. Please open an issue. Meanwhile, try Array{Float64, 2} instead of Array{Float64}.

smonsays · June 4, 2020, 3:41pm

@mohamed82008 appears to be right, HMC() and NUTS() work when changing the type parameter to:

@model function hmm(states, ::Type{T} = Array{Float64, 2}) where {T}
...

I will open an issue on GitHub in this regard.

This would be nice, but unfortunately it does not work for me. It returns the following error:

LoadError: MethodError: Cannot `convert` an object of type Missing to an object of type Float64

I took this additional check from a code snippet in the official guide.

devmotion · June 4, 2020, 6:37pm

The logic for how Turing deals with missing values (or rather with ~) is very simple, but it seems we’re not doing a good job with explaining it since these questions and errors come up very regularly. So the two basic rules are: ~ indicates an assumption (i.e., the variable on the LHS is sampled) if

the variable on the LHS of ~ is not an input argument to the model, or
the LHS evaluates to missing.

So if you pass fill(missing, 2, 100) as states to the model, then the LHS of expressions such as states[:, i] ~ ... evaluates to a vector (of missing) but not just missing, and hence the model does consider it as an observation. However, evaluating the log probability for the “observation” on the LHS fails with the error you got.

So the correct/natural way to implement this model would be

@model function hmm(states, ::Type{T} = Float64) where T
    # Initialize arrays for the time-series
    observations = Matrix{T}(undef, 2, length(states))

    # Prior for first step
    states[1] ~ MvNormal([0.0, 0.5], sqrt(0.001))
    observations[:, 1] ~ MvNormal(states[1], R/Δt)

    for t in 2:length(states)
        states[t] ~ MvNormal(A * states[t-1] + B * u, Δt*Q)
        observations[:, t] ~ MvNormal(states[t], R/Δt)
    end

    return states
end

i.e., states are vectors of vectors instead of matrices (BTW I also assume that in general this version should be faster since it avoids the allocations of all the vectors that you create every time you select a column - since observations are just sampled it is probably more efficient to keep them as matrices). You can then use states such as

states = [rand(2) for _ in 1:100]

or

states = convert(Vector{Union{Missing,Vector{Float64}}}, fill(missing, 100))

or

states = Union{Missing,Vector{Float64}}[missing for _ in 1:100]

(we have to be able to save the sampled state in states, and hence fill(missing, 100) won’t work because it creates a vector of type Vector{Missing}). I suggest that you convert them to a concretely-typed vector after sampling from the prior though. Also precomputing all constant terms might improve efficiency (for 2x2 matrices it shouldn’t matter much but I guess it will be useful in higher dimensions).

smonsays · June 15, 2020, 7:13pm

Thank you David for taking the time, that was very helpful!

smonsays · July 27, 2020, 6:48pm

Sorry for warming up this thread again, but I was once again trying your suggestion to simplify handling missing values and I realised that I receive the following error message for PG() and SMC() (not for the other samplers though) that I am unsure how to debug:

LoadError: Libtask.CTaskException(Task (failed) @0x00007f77bb145120)

This is the full code to reproduce it:

using LinearAlgebra
using Turing

# Linear Model
Δt = 0.1
A = [1.0 Δt; 0.0 1.0]
Q = [0.25 * Δt^4 0.5 * Δt^3; 0.5 * Δt^3 Δt^2] + 1e-6 * I

H = [1.0 0.0; 0.0 1.0;]
R = 0.1 * I

@model function hmm(states, ::Type{T} = Float64) where T
    # Initialize arrays for the time-series
    observations = Matrix{T}(undef, 2, length(states))

    # Prior for first step
    states[1] ~ MvNormal([0.0, 0.5], sqrt(0.001))
    observations[:, 1] ~ MvNormal(states[1], R/Δt)

    for t in 2:length(states)
        states[t] ~ MvNormal(A * states[t-1], Δt*Q)
        observations[:, t] ~ MvNormal(states[t], R/Δt)
    end

    return states
end

# Sample a trial from the prior to be used as the ground truth
hmm_prior = hmm(Union{Missing,Vector{Float64}}[missing for _ in 1:100])
states = hmm_prior()

# Condition on the sampled trial 
hmm_conditioned = hmm(states)

# Perform inference (working)
# chain = sample(hmm_conditioned, IS(), 1000)
# chain = sample(hmm_conditioned, HMC(0.1, 5), 1000)
# chain = sample(hmm_conditioned, NUTS(0.65), 1000)

# Perform inference (not working)
chain = sample(hmm_conditioned, SMC(), 1000)
# chain = sample(hmm_conditioned, PG(10), 1000)

smonsays · September 24, 2020, 10:06am

Could someone look into this? I still receive the following error when using SMC() or PG() but not for the other samplers:

ERROR: LoadError: CTaskException:
Malformed dims

trappmartin · September 25, 2020, 2:02pm

Unrelated question, is there a reason you like to use PG instead of HMC or NUTS?

trappmartin · September 26, 2020, 12:22pm

Btw. I just had a look at your code, and it seems that SMC and PG don’t like the Array{Union{Missing, Array{Float64,1}},1} you condition on. The solution to this is to simply use the following instead.

states = Array{Array{Float64}}(hmm_prior())

This is likely a bug in LibTask. I’ll open an issue for you.

smonsays · September 28, 2020, 11:20am

Thanks a lot, that did the trick!

Unrelated question, is there a reason you like to use PG instead of HMC or NUTS?

I would prefer to use HMC/NUTS but in my full model (I only posted a simplified version), I have components that are AFAIK non-differentiable and only work with PG.

Topic		Replies	Views
Issues with HMC/NUTS samplers Probabilistic programming turing , dynamichmc	4	644	September 29, 2021
Fast/Approximate Sampling for HMMs/HDP-HMMs Probabilistic programming	5	505	June 3, 2022
Turing - Syntax for Compositional Gibbs Statistics package , turing	4	1919	June 27, 2019
Question on Bayesian Inference with Julia, using Turing and/or AdvancedHMC and/or Distributions Statistics question	11	1901	March 11, 2020
Turing model runs with one sampler but not with another New to Julia turing	2	516	May 6, 2020

Inference using HMC on a Hidden Markov Model

Related topics