[ANN] HiddenMarkovModels.jl: when did HMMs get so fast?

What the **** is an HMM?

Skip to the good bits below if you’re in a rush

Hidden Markov Models (HMMs for short) are a statistical modeling framework that is ubiquitous in signal processing, bioinformatics and plenty of other fields. They capture the distribution of an observation sequence (Y_t) by assuming the existence of a latent state sequence (X_t) such that:

  • (X_t) follows a (discrete time, discrete space) Markov chain
  • for each t, the distribution of Y_t is entirely determined by the value of X_t

Imagine we are given an observation sequence (Y_t) and a parametric family of HMMs \{p_\theta\}. Following this famous tutorial, we can list three fundamental problems, each of which has a solution that relies on dynamic programming:

Problem Description Solution
Evaluation Compute the likelihood of the observation sequence p_\theta(Y) for a fixed \theta Forward algorithm
Decoding Compute the most likely state sequence \arg\max_X p_\theta(X \mid Y) for a fixed \theta Viterbi algorithm
Learning Find the best parameter \arg\max_\theta p_\theta(Y) Baum-Welch (EM) algorithm

Who are the competitors?

Of course, you saw me coming a mile away: there are already some Python packages that provide HMM functionalities. The main two are:

  • hmmlearn – this one seems dormant
  • pomegranate – this one does plenty of other things, it truly is an amazing toolkit, and its maintainer was very friendly when I asked for help

There is also a Julia package called HMMBase.jl, developed by @maxmouchet a while ago. But for my PhD research, I found myself wanting things that none of these implementations provided. So I rolled up my sleeves and created HiddenMarkovModels.jl.

What follows is not a criticism of past efforts: if anything, it is an hommage. HMMBase.jl in particular was my main source of inspiration, and much of my code comes from it. I have discussed it with @maxmouchet, and we agreed to declare my package the official successor to HMMBase.jl. Since the interface is very similar, users shouldn’t struggle too much with the transition.

As a side note, one can always rephrase an HMM as a probabilistic program and throw some MCMC or variational inference at it. But at least in the basic case I describe, it will be much slower than dedicated routines. That is why I don’t include packages such as Turing.jl in the comparison.

What can the package do?

To get a feel for the package, check out its documentation at https://gdalle.github.io/HiddenMarkovModels.jl/. Here is a feature comparison between HMMBase.jl and HiddenMarkovModels.jl:

Feature HMMBase.jl HiddenMarkovModels.jl
Number types Float64 anything
Observation types Number or Vector anything
Observation distributions from Distributions.jl satisfying DensityInterface.jl
Priors / structures no customizable
Autodiff no forward mode (for now)
Multiple sequences no parallelized

Reaching this level of generality was my main motivation. Indeed, my doctoral research involved differentiating the likelihood of HMMs with multiple sequences of Poisson process observations. This kind of funky combination is now possible with very little additional work from the user.

How reliable is it?

The package follows all the best practices that I’m aware of:

Does it run fast?

In addition to the features, there is a number of tricks that I used to speed up the code: avoiding allocations, calling linear algebra subroutines, multithreading across sequences, etc.
The results are pretty amazing on small- to medium-scale problems. HiddenMarkovModels.jl blows HMMBase.jl out of the water, and compares quite favorably to hmmlearn (which has a NumPy backend) and pomegranate (which has a PyTorch backend), even though I ran it on a single thread.
Complete reproducible benchmarks with their explanations can be found at https://gdalle.github.io/HiddenMarkovModels.jl/dev/benchmarks/, they’re run automatically before each docs build.

Where will it go next?

The package is currently awaiting registration (see this PR to the general registry).

Here are some of the things that I would like to work on soon-ish:

Contributors are welcome!
In the long run, I will probably transfer this package to JuliaStats, but for now I’d like to keep control until things are stabilized.

I’ve read everything, what now?

If you’ve made it this far, congrats. Take a break. Treat yourself to a fruit juice. Chill.

And whenever you’re ready, you can run the command below, then look at the tutorial to start playing around: https://gdalle.github.io/HiddenMarkovModels.jl/dev/tutorial/.

pkg> add https://github.com/gdalle/HiddenMarkovModels.jl

If you have any suggestions or run into trouble, feel free to open an issue or use this thread! Also, a little star on the repo is always appreciated if you find it useful :innocent:


This is so amazing. Thanks for working on this and creating an excellent framework for working with HMMs. I just wanted to point out that there is another package called MarkovModels.jl that seems to have neat implementations of some of the algorithms (e.g. Viterbi) that apparently scales well.

1 Like

Are you gonna add the Aqua badge to the repo?

1 Like

Wow, I missed that one in my SoTA review. Might add it to the benchmarks sometime! And I should specify that my package is mostly geared towards middle-scale uses on the CPU, I made absolutely zero effort to be GPU-friendly

I didn’t even remember there was such a badge! Is there a JET badge?

JET QA Works (from Logo or Readme badge similar to Aqua's · Issue #375 · aviatesk/JET.jl · GitHub)

1 Like

There is now a new version v0.4.1 which supports most of the stuff listed here! Go try it out :slight_smile:


Update: version 0.5 is registered and has been turned into a JOSS paper!


Amazing! :blush::blush::blush:
Do you have plan to support High-Order HMM?

Do you mean HMMs where the dependency is on the previous k states? If so, no, but you can probably trick the package into doing what you want by encoding k states into a single integer and using a sparse transition matrix.
Some discussion of this workaround can be found in this issue, and I can also help you if you’re stuck


This looks like great work, congratulations, I did notice that the tutorial link ending “/dev/tutorial” above doesn’t work and gives a 404​:+1::grin:

Doesn’t that violate the Markov property? I thought Markov models were defined as dependent on previous state alone.

The first message was written with a previous version of the package, where the documentation was structured differently. Now there is more than one tutorial, hence the broken link :wink: Here’s the first one:

It is indeed a generalization of the Markov property to depend on the past k states, but as I have hinted before, you can reformulate such models to depend only on the last state by changing what you put in the state. So it’s not really more generic