Dirichlet Process Mixture Models (DPGMM)


I wanted to try out DPGMM as a way to cluster some data I’m working on, and I came across this package: https://github.com/sbos/DirichletProcessMixtures.jl.
I’m not sure if anyone is still maintaining that package, though. In any case, I create a fork ( https://github.com/grero/DirichletProcessMixtures.jl) where I ended up re-writing much of the code to make it more consistent with other statistical packages (just the framework, I didn’t touch the core algorithms since I’m not very familiar with them). It is still a work in progress, and I’ll probably keep tinkering with it to suit my needs. I just wanted to put this effort out there, if anyone is interested in taking a look, or if the original author wants to collaborate. Thanks!


It appears that there are no tests in the original code, so

  1. verifying the correctness of the original implementation is difficult,
  2. if you change things, it is similarly challenging to make sure the algorithm remains correct (assuming it was to begin with).

So it is very likely that you have to invest a bit in understanding the algorithm, to the extent that you can write meaningful unit tests.