I wanted to try out DPGMM as a way to cluster some data I’m working on, and I came across this package: https://github.com/sbos/DirichletProcessMixtures.jl.
I’m not sure if anyone is still maintaining that package, though. In any case, I create a fork ( https://github.com/grero/DirichletProcessMixtures.jl) where I ended up re-writing much of the code to make it more consistent with other statistical packages (just the framework, I didn’t touch the core algorithms since I’m not very familiar with them). It is still a work in progress, and I’ll probably keep tinkering with it to suit my needs. I just wanted to put this effort out there, if anyone is interested in taking a look, or if the original author wants to collaborate. Thanks!
It appears that there are no tests in the original code, so
- verifying the correctness of the original implementation is difficult,
- if you change things, it is similarly challenging to make sure the algorithm remains correct (assuming it was to begin with).
So it is very likely that you have to invest a bit in understanding the algorithm, to the extent that you can write meaningful unit tests.