Unhexlify and int.from_bytes (From a lazy pythonist to a confused Julia user)

So I’m slowly transferring some crypto blockchain programming to Julia and I’m hitting quite a wall with all the whole hashing, bytes, etc etc.

So the first one I’m trying to migrate over is the python binascii.unhexlify (or a2b_hex) which take an ascii encoded string.

What I got so far is this:

function unhexlify(str)
    result = ""
    for i in range(1,length=length(str),step=2)
        result *= Char(parse(Int64,str[i:i+1], 16))
    end
    return result
end

When I pass the string "038decd6ff8f45d6b523c25eb5cc669fa1fc2fd33aaa5f56408abad126aa3e68"

I get back
"\x03\u8dìÖÿ\u008fEÖµ#Â^µÌf\u9f¡ü/Ó:ª_V@\u8aºÑ&ª>h"

While Python:
binascii.unhexlify("038decd6ff8f45d6b523c25eb5cc669fa1fc2fd33aaa5f56408abad126aa3e68".encode("ascii"))
returns
b'\x03\x8d\xec\xd6\xff\x8fE\xd6\xb5#\xc2^\xb5\xccf\x9f\xa1\xfc/\xd3:\xaa_V@\x8a\xba\xd1&\xaa>h'

Close but no cigar

Then I need to take that binary and convert it to Int.

In python I’d use:
Int.from_bytes(str, "big")

which translate to:

def from_bytes_big(b):
    n = 0
    for x in b:
        n <<= 8
        n |= x
    return n

But this one I can’t get to work. probably due to my Julia variable not beeing in the right format

Any help would be really appreciated as I’m really not used to how julia treats bytes. Thanks!

Something like this?

julia> parse(BigInt, "038decd6ff8f45d6b523c25eb5cc669fa1fc2fd33aaa5f56408abad126aa3e68", base=16)
1607698590363554088754641199134486193014446681676656441135069319285818670696

If you want to do it yourself, Julia’s byte-manipulation syntax is very similar to Python’s:

julia> function from_hex(h, n)
           b = hex2bytes(h)
           for x in b
               n <<= 8
               n |= x
           end
           return n
       end
from_hex (generic function with 1 method)

julia> from_hex( "038decd6ff8f45d6b523c25eb5cc669fa1fc2fd33aaa5f56408abad126aa3e68", BigInt(0))
1607698590363554088754641199134486193014446681676656441135069319285818670696
3 Likes

Well I’ll be damned. So I had to skip a step in order to get the right answer. That’s great!

Thanks, really appreciated

Note that operations on built-in, arbitrary-precision BigInts will significantly slower than on fixed-precision integers from BitIntegers.jl:

julia> using BenchmarkTools, BitIntegers

julia> @btime parse(Int256, "038decd6ff8f45d6b523c25eb5cc669fa1fc2fd33aaa5f56408abad126aa3e68", base=16)
  1.210 μs (9 allocations: 168 bytes)
1607698590363554088754641199134486193014446681676656441135069319285818670696

julia> @btime from_hex("038decd6ff8f45d6b523c25eb5cc669fa1fc2fd33aaa5f56408abad126aa3e68",
                       BigInt(0))
  16.300 μs (282 allocations: 4.88 KiB)
1607698590363554088754641199134486193014446681676656441135069319285818670696

julia> @btime from_hex("038decd6ff8f45d6b523c25eb5cc669fa1fc2fd33aaa5f56408abad126aa3e68", 
                       Int256(0))
  216.167 ns (1 allocation: 128 bytes)
1607698590363554088754641199134486193014446681676656441135069319285818670696

Thanks for the heads up. It’s working perfectly with BitIntegers. I still have a long way to go but this is a huge step in the right direction. Next is to dust off my Jacobi matrix textbook

A little tangential: Python just (unhelpfully, I might add) displays byte “strings”/arrays in a weird string-y fashion.

An equivalent output to unhexlify can be obtained with just hex2bytes (i.e. hex2bytes is exactly unhexlify from python), which already gives you a vector of bytes. If you want to compare it to pythons’ representation of those bytes, you can map(Char, s) the result (but this will give you a vector of characters, not bytes):

julia> s = hex2bytes("038decd6ff8f45d6b523c25eb5cc669fa1fc2fd33aaa5f56408abad126aa3e68"); 
                                                                                          
julia> map(Char, s)                                                                       
32-element Vector{Char}:                                                                  
 '\x03': ASCII/Unicode U+0003 (category Cc: Other, control)                               
 '\u8d': Unicode U+008D (category Cc: Other, control)                                     
 'ì': Unicode U+00EC (category Ll: Letter, lowercase)                                     
 'Ö': Unicode U+00D6 (category Lu: Letter, uppercase)                                     
 'ÿ': Unicode U+00FF (category Ll: Letter, lowercase)                                     
 '\u8f': Unicode U+008F (category Cc: Other, control)                                     
 'E': ASCII/Unicode U+0045 (category Lu: Letter, uppercase)                               
 'Ö': Unicode U+00D6 (category Lu: Letter, uppercase)                                     
 'µ': Unicode U+00B5 (category Ll: Letter, lowercase)                                     
 '#': ASCII/Unicode U+0023 (category Po: Punctuation, other)                              
 'Â': Unicode U+00C2 (category Lu: Letter, uppercase)                                     
 '^': ASCII/Unicode U+005E (category Sk: Symbol, modifier)                                
 'µ': Unicode U+00B5 (category Ll: Letter, lowercase)                                     
 'Ì': Unicode U+00CC (category Lu: Letter, uppercase)                                     
 'f': ASCII/Unicode U+0066 (category Ll: Letter, lowercase)                               
 '\u9f': Unicode U+009F (category Cc: Other, control)                                     
 '¡': Unicode U+00A1 (category Po: Punctuation, other)                                    
 'ü': Unicode U+00FC (category Ll: Letter, lowercase)                                     
 '/': ASCII/Unicode U+002F (category Po: Punctuation, other)                              
 'Ó': Unicode U+00D3 (category Lu: Letter, uppercase)                                     
 ':': ASCII/Unicode U+003A (category Po: Punctuation, other)                              
 'ª': Unicode U+00AA (category Lo: Letter, other)                                         
 '_': ASCII/Unicode U+005F (category Pc: Punctuation, connector)                          
 'V': ASCII/Unicode U+0056 (category Lu: Letter, uppercase)                               
 '@': ASCII/Unicode U+0040 (category Po: Punctuation, other)                              
 '\u8a': Unicode U+008A (category Cc: Other, control)                                     
 'º': Unicode U+00BA (category Lo: Letter, other)                                         
 'Ñ': Unicode U+00D1 (category Lu: Letter, uppercase)                                     
 '&': ASCII/Unicode U+0026 (category Po: Punctuation, other)                              
 'ª': Unicode U+00AA (category Lo: Letter, other)                                         
 '>': ASCII/Unicode U+003E (category Sm: Symbol, math)                                    
 'h': ASCII/Unicode U+0068 (category Ll: Letter, lowercase)                               

There are some characters that my terminal and julia happily displays, unlike python. In the example above, that would be \xec i.e. 'ì', which is the third byte in the vector obtained via hex2bytes and the third byte in your byte string:

b'\x03\x8d\xec  [...]

What your original unhexlify did was create a String of all those characters joined together:

julia> inp = "038decd6ff8f45d6b523c25eb5cc669fa1fc2fd33aaa5f56408abad126aa3e68"
"038decd6ff8f45d6b523c25eb5cc669fa1fc2fd33aaa5f56408abad126aa3e68"             
                                                                               
julia> function unhexlify(str) # had to fix a few issues before I could get it to run
           result = ""                                                         
           for i in 1:2:length(str) # more idiomatic, assumes ascii though
               result *= Char(parse(Int64,str[i:i+1], base=16)) # the base is a keyword
           end                                                                 
           return result                                                       
       end                                                                     
unhexlify (generic function with 1 method)                                     
                                                                               
julia> join(map(Char, hex2bytes(inp))) == unhexlify(inp)                       
true                                                                           

Additionally, julia Strings are UTF-8 encoded by default and are nothing like the byte “strings” python has. One important distinction is that julias’ Char is really a unicode codepoint, not a single byte.

Also, ranges include both of their endpoints.

Yeah, one of the thing that confused me the most is probably the way Julia can represent characters that I couldn’t through python and it completely flew over my head that this could be why I didn’t get the same result.

I now realize that the way I did it was “kinda” working but could be highly optimize. but yes, the more I work with bytes the more I start to dislike the python b"…" representation

1 Like