Convert utf8 to binary


I’m new to Julia and trying to solve the cryptopals challenge:

This challenge requires to compute the hamming distance between two strings:

str1 = "this is a test"

str2 = "wokka wokka!!!"

The Hamming distance is the number of differing bits. In the above case the distance must be 37.

My approach was the following:

Vector{UInt8}(str1) .!= Vector{UInt8}(str2)

but this is wrong (results in a distance of 14)
I get the same results using the hammond function in the Distances package.

To confirm that 37 is correct I converted the strings to their bit values using an ASCII table (by hand)

s1 = [0,1,1,1,0,1,0,0,0,1,1,0,1,0,0,0,0,1,1,0,1,0,0,1,0,1,1,1,0,0,1,1,0,0,1,0,0,0,0,0,0,1,1,0,1,0,0,1,0,1,1,1,0,0,1,1,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,1,1,0,1,0,0,0,1,1,0,0,1,0,1,0,1,1,1,0,0,1,1,0,1,1,1,0,1,0,0]

s2 = [0,1,1,1,0,1,1,1,0,1,1,0,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,0,1,0,1,1,0,1,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,1,1,0,1,1,1,0,1,1,0,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,0,1,0,1,1,0,1,1,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,1]

now the distance is indeed 37:

julia> sum(abs.(s1 .- s2))

What is a more elegant solution to this problem?

The Hamming distance is the number of differing characters, no? Anyway, you could sum the bits in the xor between the elements of the UInt8 vector you created. I think there is a popcnt in Julia.

Edit: it is called count_ones


@pihive You’re getting a different answer because you’re measuring the number of bytes which are different, not the number of bits.

You can find bitwise difference between two bytes with xor:

julia> a = 0xab

julia> b = 0x04

julia> bitstring(a)

julia> bitstring(b)

julia> bitstring(xor(a, b))

so now you just need a way to count all the 1s in a byte. Here’s an overly complicated but compact one:

julia> function count_ones(x::UInt8)
         Int(sum((x >> i) & 0x01 for i in 0:7))
count_ones (generic function with 1 method)

Now you just need to xor each byte of the two inputs and sum up the total number of ones. You can do that with broadcasting and summing, just like in your hand-written binary example.

Edit: It turns out that Julia already provides count_ones (with the same name). So you don’t need my overly complicated implementation at all!

1 Like

In particular:

julia> sum(count_ones.(xor.(codeunits(str1), codeunits(str2))))

(I used codeunits rather than Vector{UInt8} to avoid making a copy — codeunits is just a byte-based view of the string contents.)