A bit late to the party, but I just had a similar use case today.
The fastest method I could find was working on UInt8 instead of BitArrays, count_ones
and LoopVectorization:
function hamming_distance(h1, h2)
s = 0
@avx for i = 1:length(h1)
s += count_ones(xor(h1[i], h2[i]))
end
s
end
h1 = Vector{UInt8}(randstring(hash_length))
h2 = Vector{UInt8}(randstring(hash_length))
julia> @btime hamming_distance($h1, $h2)
11.813 ns (0 allocations: 0 bytes)
The method above (based on the same BitVectors) is slower for me:
function make_bitvector(v::Vector{UInt8})
siz = sizeof(v)
bv = falses(siz<<3)
unsafe_copyto!(reinterpret(Ptr{UInt8}, pointer(bv.chunks)), pointer(v), siz)
bv
end
h1b = make_bitvector(h1)
h2b = make_bitvector(h2)
buf = similar(h1b)
julia> @btime hamming($h1b, $h2b, $buf)
16.933 ns (0 allocations: 0 bytes)
Is bitwise Hamming Distance on GPU an option to get even more speed? With my very limited GPU knowledge I could not find a way to make it faster (or even the same speed) as on CPU.
Edit: btw. I was able to beat the performance of a ~150 LOC Cython implementation with less than 30 lines of Julia code