I’ve written a hierarchical clustering algorithm and I would like to speed it up if possible. It seems that my limiting step is
D is my dissimilarity matrix. This is a
LowerTriangular matrix (since the order of the pairs doesn’t matter, i.e.
D[i,j] == D[j,i].
I was thinking it might help to convert this into a Vector, sort it and then do
findmax. However, two issues:
- How do I get back to my matrix from the vector?
- I update
Dafter each cluster merge. How can I insert the new dissimilarity values into the sorted vector representation of
Din an efficient way?
Any thoughts would be very much welcome!
PS I know there is a clustering package for Julia, but I opted to write my own because I also need to implement a particular type of hierarchical clustering where you can only merge adjacent clusters. It seemed a bit too involved to do this by extending the clustering package.