Convert hash digest to UInt256

jj-404 · April 10, 2022, 3:35pm

Hello guys,
I need to convert the output of sha256 to a UInt256. What is the optimal way to do that performance-wise?

using SHA, BitIntegers
x=sha256("okok")
# convert(UInt256, x)

for context, sha256 returns a Vector{UInt8}

goerch · April 10, 2022, 6:28pm

Laconically today,

julia> using BitIntegers

julia> reinterpret(UInt256, rand(UInt8, 64))
2-element reinterpret(UInt256, ::Vector{UInt8}):
 0x1a12ed72cbc1aa4f71e3dd40bd6a1af939664d1cbbc8fa06aa4de57f60da72cd
 0x8350a89b2786c47309bbf6eeb71a21a3dec3bc73aca23396ffc10356223e9570

jj-404 · April 10, 2022, 8:12pm

Thanks a lot,
This works reinterpret(UInt256, x)[1]

stevengj · April 10, 2022, 9:17pm

This might not use the byte order you want:

julia> h = rand(UInt8, 32);

julia> bytes2hex(h)
"b9a868cba886baa10e08c841e8c0eabac34b83de68207c8bad6f2c869bc4a547"

julia> reinterpret(UInt256, h)
1-element reinterpret(UInt256, ::Vector{UInt8}):
 0x47a5c49b862c6fad8b7c2068de834bc3baeac0e841c8080ea1ba86a8cb68a8b9

Because the hash is returned in bigendian order, in principle you can use ntoh(reinterpret(UInt256, h)).

Currently ntoh fails because bswap is not implemented for UInt256, but that seems like an oversight in the BitIntegers.jl package (BitIntegers.jl#26) that could be easily remedied.

jj-404 · April 11, 2022, 2:14pm

ah thanks, that saved me lot of debugging time!, also thanks for opening the issue
not a great solution but in the meantime, this works:

reinterpret(UInt256, reverse(x))[1]

rfourquet · April 12, 2022, 9:17am

I think I would go with pointer manipulation for speed, and bypass reinterpret:

GC.@preserve x bswap(unsafe_load(Ptr{UInt256}(pointer(x))))

jj-404 · April 12, 2022, 7:00pm

Really Cool! this is almost 4 times faster,
would you mind explaining what’s going on here?
specifically, what does the GC.@preserve, and also is the “unsafe_load” actually safe to use in all use cases?

goerch · April 12, 2022, 7:10pm

AFAIU it instructs the GC not to interfere with the unsafe_load (otherwise you could end in undefined behavior land) and I’m really unsure if we should recommend this to unsuspecting users…

No, exactly (the problem being it could look like it for a long time;).

If performance is of outermost importance in the meantime, would you propose a MWE?

jj-404 · April 12, 2022, 7:25pm

Ahah, that’s what I thought, I read about it and will stick with the other (more julian) option for now, thanks.
Performance is important in my case, but not at the cost of readability nor unsafeness, and in this example, the hash function takes most of the execution time anyway.
I also saw that if you put GC.@preserve inside a function and then define it differently, the first definition stays, and it could add some confusion for users.
my MWE would be the following:

using BitIntegers, SHA
f(x) = reinterpret(UInt256, reverse(sha256(x)))[1]

goerch · April 12, 2022, 7:49pm

Nice!

I tried to tuple a bit and found this one

x = sha256("The quick brown fox jumps over the lazy dog")
y = (x...,)
@show typeof(x)
@show reinterpret(UInt256, x)
@show typeof(y)
@show reinterpret(UInt256, y)

yielding

typeof(x) = Vector{UInt8}
reinterpret(UInt256, x) = UInt256[0x92e5c937bfd0022d76db3c6de451568d4f2e08b0bc9aca699480d707b3fba8d7]
typeof(y) = NTuple{32, UInt8}
ERROR: bitcast: expected primitive type value for second argument

Reasonable (or in other words: since when is Vector primitive)?

rfourquet · April 13, 2022, 8:31am

unsafe_load is safe to use when you know that the pointer points at a valid object; here after obtaining the pointer via pointer(x), x might be garbage collected before unsafe_load retrieves the value, so GC.@preserve x ... makes sure x is kept alive for the duration of the expression.

rfourquet · April 13, 2022, 8:37am

If you want to avoid unsafe_load, and if performance is important, you might be better off with

bswap(reinterpret(UInt256, (x))[1])

rather than

reinterpret(UInt256, reverse(sha256(x)))[1]