I have written my first complete piece of Julia code as follows:

using Random: bitrand
function hamming(bits1, bits2)
return count(bits1 .⊻ bits2)
end
function random_strings(n, N)
return [bitrand(n) for i in 1:N]
end
function find_min(strings, n, N)
minsofar = n
for i in 1:N
for j in i+1:N
dist = hamming(strings[i], strings[j])
if dist < minsofar
minsofar = dist
# print("Running min ", minsofar, "\n")
end
end
end
return minsofar
end
function ave_min(n, N)
ITER = 100
total = 0
for i in 1:ITER
strings = random_strings(n, N)
new_min = find_min(strings, n, N)
print("New min ", new_min, "\n")
total += new_min # Fixed following comment
end
return total/ITER
end
N = 2^13
n = 150
print("Overall average ", ave_min(n, N), "\n")

Unfortunately my efforts to have a super fast Hamming distance function have failed. I am told this is because the XOR in the hamming function is being performed bitwise (and hence slowly), instead of in 64 bit chunks.

I read Xor performance but it’s currently beyond me to work out exactly what I need to do to speed the code up.

You should primarily worry about general Performance Tips before worrying about the speed of xor Your code is creating a lot of intermediate arrays in a bunch of different places (notably random_strings and hamming) and you’re also doing some redundant work (why call find_min twice and thus recalculate the new_min value instead of adding the existing variable to the total?).

Not everything from the general performance tips is going to be applicable, but avoiding allocations is sure to speed up your code (although I have not benchmarked it yet).

The code is generally fine, and the allocations in random_strings here don’t really matter. The issue is indeed with hamming.

What you can do is pass a buffer to hamming in which to store the results of the xor:

using Random: bitrand
hamming(bits1, bits2, x) = count(map!(⊻,x,bits1,bits2))
random_strings(n, N) = [bitrand(n) for i in 1:N]
function find_min(strings, n, N, x)
minsofar = n
for i in 1:N
for j in i+1:N
dist = hamming(strings[i], strings[j], x)
if dist < minsofar
minsofar = dist
end
end
end
return minsofar
end
function ave_min(n, N, ITER=100)
total = 0
x = bitrand(n) # buffer
for i in 1:ITER
strings = random_strings(n, N)
new_min = find_min(strings, n, N, x)
println("Iteration $i min: $new_min")
total += new_min
end
return total/ITER
end

I passed ITER as a variable because 100 was too much. With 10 iterations:

@time println("Overall average ", ave_min(n, N, 10))

I get the following for the original version:

22.074000 seconds (671.26 M allocations: 45.010 GiB, 18.84% gc time)

For the in-place version:

5.385643 seconds (164.01 k allocations: 11.882 MiB)

You can get rid of the buffer by implementing a custom function too if you want:

function hamming(A::BitArray, B::BitArray)
#size(A) == size(B) || throw(DimensionMismatch("sizes of A and B must match"))
Ac,Bc = A.chunks, B.chunks
W = 0
for i = 1:(length(Ac)-1)
W += count_ones(Ac[i] ⊻ Bc[i])
end
W += count_ones(Ac[end] ⊻ Bc[end] & Base._msk_end(A))
return W
end

Here, the mutated part is only used as a buffer and is not a result of the function. In those cases I usually don’t put an exclamation mark as I don’t care what goes in or what comes out.

I did try adding @inbounds but it didn’t help, in fact, it made it worse by ~10%. I just rechecked and got the same thing. Any idea what could be going on?

I got an explanation that the problem is the use of bitvectors. It seems you can make it hugely faster by using UInt64 or UInt128 instead (assuming n < 129). I haven’t had a chance to test it yet though.

If you compute the Hamming distance between two 64 bit vectors represented as two UInt64 values that is one XOR and one popcount assembly instruction. That is most likely less than a nanosecond in total.

I don’t understand why bitvectors don’t do the same thing. They too are just wrapped arrays of UInt64. Iteration is needed if you have an array of UInt64s, surely.

I don’t see why they are different. Something to do with broadcasting?