[ANN] HMMBase.jl - A lightweight and efficient Hidden Markov Model abstraction

Hi,

I’d like to introduce HMMBase.jl. A package providing basic building blocks for hidden Markov models in Julia.

Motivation

While there are many implementations of HMMs for Julia, and other languages, I found out that:

  • Julia packages, such as HiddenMarkovModels.jl were not maintained anymore and thus not compatible with recent Julia version.
  • Most HMMs libraries tend to directly implement the algorithms for a given probability distribution, most commonly discrete and normal distributions, and hence cannot easily support new distributions.

Instead, I chose to rely on the Distribution interface provided by Distributions.jl to handle arbitrary distributions. As long as the distribution implements the pdf and fit! method, it is supported by HMMBase. Whether it is univariate or multivariate, a single distribution or a mixture model.

For example, we can sample a two states HMM with two different distributions as follows:

hmm = HMM([0.9 0.1; 0.1 0.9], [Normal(0,1), Gamma(1,1)])
z, y = rand(hmm, 1000)

The only constraint being that each observation distribution must have the same dimension (e.g., it is not possible to mix 2d and 3d normal distributions).

Similarly, arbitrary containers that conforms to the AbstractArray interface are supported to store the model parameters and the states/observations. This means that, for example, ArrayFire.jl could be used to perform some computations on the GPU. That said, I only have tested standard Julia arrays and StaticArrays for now.

Goals

My goal is to provide well-tested and efficient implementations of the following algorithms:

  • Forward-backward
  • Viterbi
  • Baum-Welch (MLE estimator)

The MLE estimator implements only the E part and relies on the fit_mle method of each distribution for the M part.

For now my (early) benchmark is against the hmmlearn Python library which implements the same algorithm in Cython (for normal distributions only). This issue shows some early results.

Availability

The package is available for Julia 1.1+ in my own registry:

pkg> registry add https://github.com/maxmouchet/JuliaRegistry.git
pkg> add HMMBase

Feel free to consult the documentation for examples.

Right now the only package that depends on HMMBase, is my implementation of the Gibbs sampler for the hierarchical Dirichlet process hidden Markov model, HDPHMM.jl. Both of these packages will be maintained during my PhD as some of my work depends on them. Hopefully, HMMBase will be stable enough by the time I graduate that it does not need much work anymore.

I’m open to feedback, ideas, and contributions :slight_smile:

37 Likes

This is cool, thanks for sharing! :slight_smile:

To me, it looks like HMMBase.jl implements only first-order models at the moment (i.e. the transition probabilities are conditioned on only the previous one state of history). Is this right?

The reason that I’m asking is that in natural language processing, HMMs have been used for sequence labeling tasks like part-of-speech tagging. In this task, for example, a second-order HMM that conditions on the previous two tags generally seems to do better than a first-order model that just conditions on one previous tag.

(If this would need to be a separate subtype of AbstractHMM, perhaps this is an area where I could contribute!)

2 Likes

Sorry for my late reply! Indeed it only implements first-order models.

I’m not very familiar with higher-order models, but right now I see two ways of implementing them:

  • Writing specialized versions of the functions (e.g. messages_forwards_2);
  • Implementing more generic algorithms (such as belief propagation) to handle models of arbitrary orders in a single function;

The second option seems cleaner but I’m worried about the performance implications. On the other hand, macros could be used to generate specialized function for an arbitrary order instead.

As for the types, I can add a new parameter like AbstractHMM{F<:VariateForm, N} where N represents the order (1 by default).

That said, feel free to draft something for second-order models and to open an issue/PR if you want. I’ll do some experiments also on my side.

4 Likes

Hi! Did you do end up doing any work on second-order HMMs? I have an interest in that.

2 Likes

Hi! What do you mean by second-order HMMs? I’ve taken over from Max as mister HMM, so my package HiddenMarkovModels.jl is probably your best bet

Hah! I had just asked, and then deleted, a similar question in response to your HiddenMarkovModels discussion in another thread, because I thought it was an old conversation and thus not the right place, but then see you replied to my old question in this thread. I’m all turned around!

In any case, I meant/mean HMMs where the observed state depends on two hidden states.

I think it is possible to do it from a standard HMM by encoding a tuple of states (s1, s2) as a single state s and using a sparse transition matrix. I suggested something similar as an answer to this issue, do you need help figuring out the details or is it sufficiently clear?

1 Like

Interesting, I’ll look at that, thanks. The implementation I’m currently using calculates a separate transition matrix for each observed state and then selects which one to use for each observed state in the forward calculation.

Why do you say the transition matrix would be sparse?

Because a transition would go from (the encoding of) (s1, s2) to (the encoding of) (s1', s2'). In particular, the only valid transitions are those for which s2 = s1'.

1 Like

Thanks, I understand. I may have a more complicated situation then, or at least I’m not sure how it maps - the original code was written by someone else and I’m getting my head around HMMs.

The matrices being used are computed from physical parameters, expressing “given we observed state {s1, s2, …}, the probability that the hidden states transitioned from hidden states hx to hy are given by {M1, M2, …}”. Actually they might be called “emission matrices”? But I don’t see that word appear in your documentation. Very likely I’m not understanding something fundamental.

Emissions are usually a synonym for observations in HMM parlance, so I picked one but the other term is equivalent. What you’re referring to as the “hidden state” is also just called the state, and what you’re calling “state” might be the observations?
In any case, I do think there’s an issue with your mathematical reasoning (cause I don’t understand it), and it would be easier to clarify on a Zoom call. Wanna chat in the coming days?

Thanks for the offer, but it’s not critical. Let me just try this to make it clearer:

n Observations: 0, 1

m States: 0, 1, 2

Given some physical parameters of the process, compute n, m x m matrices, which are normalized by row across all n.

The forward log likelihood function takes a vector of observations, a vector of initial state probabilities, and the matrices. For each observation, the matching matrix is selected and combined with the previous likelihoods.

If still not understandable, no worries. Though if you know of a text or other resource that could help, I’d appreciate that!

The framework used by HiddenMarkovModels.jl is that of controlled HMMs, it is described in section 4.2 of my PhD thesis: Machine learning and combinatorial optimization algorithms, with applications to railway planning - PASTEL - Thèses en ligne de ParisTech.
In the graphical model below, the state X_t is hidden, while the controls u_t and observations Y_t are known. An arrow represents an influencing relation, roughly speaking: for instance the distribution of X_t can be written \mathbb{P}(X_t \mid X_{t-1}, u_t), and the distribution of Y_t can be written \mathbb{P}(Y_t \mid X_t, u_t).

Let us denote by \theta the parameters of the model, which include the initial state distribution, state transition matrices and the distributions relating each state to the observations it can generate. Usually, when we compute (and maximize) the “likelihood” of an HMM, we are talking about the following quantity

\mathbb{P}_{\theta}(Y_{1:T} \mid u_{1:T}) = \sum_{X_{1:T}} \mathbb{P}_{\theta}(Y_{1:T} \mid u_{1:T}, X_{1:T}) \mathbb{P}_{\theta}(X_{1:T} \mid u_{1:T})

This sum has an exponential number of terms, but the special structure of HMMs means we can compute it efficiently with dynamic programming. That is one thing my package does.

From what you tell me, it seems that those binary “observations” you speak of are actually “controls”, in the sense that they influence the evolution of the state instead of being influenced by it. But if my interpretation is correct, it means that in your case, there is no Y_t? How do you get any information about the states X_t? What are those “physical parameters of the process” you mentioned?

We have videos of single-molecule blinks that are converted into a list of points in each frame, and the points are clustered and then assembled into a temporal trace of ON/OFF observations for each cluster. So each frame is a discretization of a continuous underlying process we want to simulate/learn.

Whether a spot is detected in a discrete frame depends the on rate and off rate of the underlying continuous physical process, but also on the minimum on time required for an ON observation to be recorded, and there’s also an adjustment for false positive detections.

These parameters are input into an inverse Laplace transformation to populate the matrices I was describing, and accounts for the possibility that one or many hidden state changes could occur during the discrete frame. One matrix describes the probability of an ON observation given that a given frame started in one state and ended in another state. The other does the same for an OFF observation.

Do clusters evolve independently? What set does the “state” of a frame belong to? Do you observe this state at each frame, and they are just “hidden” in the continuous time evolution between two frames?

Yes, each cluster is independent - the input being analyzed or simulated is a vector of 0s and 1s from one cluster, for “spot not observed” and “spot observed”, or OFF/ON.

If I understand your last question properly, yes. It’s a series of state changes of variable duration that are integrated and discretized into the observation sequence. (though also there is a possibility of a permanent-off state that would be indistinguishable from the normal off state by observation, other than no longer seeing any ON observations after that point).

Okay, let’s see if I got this right.

Our random variables are:

  • a process of hidden cluster states X(t) \in \mathcal{X} = \{1, \dots, m\} evolving in continuous time (t \in \mathbb{R})
  • a sequence of frames of that state X_k = X(t_k) taken at discrete instants (k \in \mathbb{N}) but still hidden
  • a sequence of on-off observations Y_k \in \mathcal{Y} = \{0, 1\} associated with each frame

Our parameters are:

  • a vector of initial state probabilities \pi \in \{0, 1\}^m such that
\mathbb{P}(X_0 = i) = \pi_i
  • two emission matrices M_0 and M_1 in \mathbb{R}^{m \times m} such that
\begin{align*} \mathbb{P}(Y_k = 0 \mid X_{k-1} = i, X_{k} = j) &= M_0[i,j] \\ \mathbb{P}(Y_k = 1 \mid X_{k-1} = i, X_{k} = j) &= M_1[i,j] \end{align*}

The parameters are known, and for some reason we want to compute the loglikelihood

\log \mathbb{P}(Y_0, \dots, Y_K)

Does that sound right?
If that is correct, we should be able to massage this model into an HMM, but we are still missing one specification: what is the distribution of the state evolution? In other words I do need

\mathbb{P}(X_k = j \mid X_{k-1} = i)
1 Like

I will get back to you on this :-). Thanks for your help in the meantime! Translating to mathematical formalization is useful.

1 Like