Fast hamming distance function in Julia that returns a distance matrix

I would first convert your dataset into an array of BitVectors. Then you can compute the Hamming distance between two BitVectors very cheaply using bitwise operations — for example, see the code posted here. Since conversion is O(n) and computing the distance matrix is O(n^2) where n is the size of your dataset, it is worth it to pay the conversion cost in order to speed up the distance computation.

(Note also that you can trivially double the speed of your code since A is symmetric — you only need to compute the Hamming distance once for each pair.)

(How large is your dataset? It can’t have that many rows or you wouldn’t be able to store the distance matrix. If you have ≤ 64 columns you can convert your data to an array of UInt64 bitstrings, which will be even faster than BitVectors.)

3 Likes