# Converting UInt8 Array to BigInt, and back

I am writing functions that would convert Array{UInt8,1} to BigInt and vice versa.

1. From BigInt to Array I use to do as follows, through an hexadecimal string.
``````function int2bytes(x::Integer)
hex = string(x, base=16)
if mod(length(hex), 2) != 0
hex = string("0", hex)
end
return hex2bytes(hex)
end
``````

@gdkrmr suggested the following faster implementation which I am now using.

``````function int2bytes(x::BigInt)
result = Array{UInt8}(undef, x.size * sizeof(eltype(x.d)))
unsafe_copyto!(convert(Ptr{eltype(x.d)}, pointer(result)), x.d, x.size)
if ENDIAN_BOM == 0x04030201
result = result[end:-1:1]
end
i = findfirst(x -> x != 0x00, result)
result[i:end]
end
``````
1. From Array to BigInt, I am also going through hexadecimal string at the moment.
``````function bytes2int(x::Array{UInt8,1})
hex = bytes2hex(x)
return parse(BigInt, hex, base=16)
end
``````

Based on @gdkrmr suggestion I am now trying to implement this conversion without string but can’t make it work… I came up with the below code which works when broken down in REPL but crashes when running it in a function.

``````function bytes2int_new(x::Array{UInt8,1})
l = length(x)
if Sys.WORD_SIZE == 64
T = UInt64
xsize = cld(l, 8)
elseif Sys.WORD_SIZE == 32
T = UInt32
xsize = cld(l, 4)
end
x = cat(fill(UInt8(0), xsize * 8 - l), x; dims=1)
if ENDIAN_BOM == 0x04030201
reverse!(x)
end
if xsize == 1
return reinterpret(T,x)
else
result = big(0)
unsafe_copyto!(result.d, convert(Ptr{T}, pointer(x)), xsize)
result.size = xsize
result.alloc = xsize + 1
return result
end
end
``````

Clearly, there is something I do not understand about pointers and BigInt implementation…

First of all, is it at all possible to create a BigInt by assigning value to its fields?
How to properly assign those value?
What does the alloc field represent?

I may also be on a wrong path for this function, so any suggestions to avoid string representation is welcome for this function

If `x` is a vector of `UInt8`, then the preferred way to convert it to a vector of `BigInt` should be `BigInt.(x)` (I guess this ends up calling some C-function in GMP, probably after converting from `UInt8` to `Int`.)

Is that not fast enough?

I think the question is about converting an array of UInt8 to a single BigInt, not to an array of BigInts.

Oh. I see. Carry on.

Yes, but proper initialization must be taken care of.
The problem in your implementation is likely that you are writing with `unsafe_copyto!` in memory regions which have not be allocated.
To create a `BigInt` which can represent a number with `n` bits, you can use `a = Base.GMP.MPZ.realloc2(n)`. Then you don’t need to touch the `.alloc` field of `a`, and you can write `n` bits at `a.d` (or `cld(n, 64)` `UInt64` words if the architecture is 64 bits); you also need to manually update the `.size` field of `a`.

Note also that this is internal API and will break eventually.

1 Like

This one is even faster:

``````function int2bytes3(x::BigInt)
n_bytes_with_zeros = x.size * sizeof(Sys.WORD_SIZE)
uint8_ptr = convert(Ptr{UInt8}, x.d)

if ENDIAN_BOM == 0x04030201
# the minimum should be 1, else the result array will be of
# length 0
n_bytes_without_zeros = 1
for i in n_bytes_with_zeros:-1:1
n_bytes_without_zeros = i
break
end
end

result = Array{UInt8}(undef, n_bytes_without_zeros)

for i in 1:n_bytes_without_zeros
@inbounds result[n_bytes_without_zeros + 1 - i] = unsafe_load(uint8_ptr, i)
end

return result
else
error("not implemented... yet")
end

end
``````
1 Like

If this is calling the gmp library function I think it is calling, beware with the size, as `n` is limited by the library to INT_MAX. This caught me off guard a while back trying to compute A(4,3) of a two argument version of the Ackermann function… So much for “arbitrary size” I have plans to create a (hopefully) pure Julia arbitrary size package, but it’s not even in it’s infancy…

1 Like

Thanks, it is working as a charm, 3 order of magnitude faster than the original code I did remove the step where it was filling Array with leading zero, proven useless and resources hungry.

Final code for reference:

``````function bytes2big(x::Array{UInt8,1})
l = length(x)
if Sys.WORD_SIZE == 64
T = UInt64
xsize = cld(l, 8)
elseif Sys.WORD_SIZE == 32
T = UInt32
xsize = cld(l, 4)
end
if ENDIAN_BOM == 0x04030201
reverse!(x)
end
result = Base.GMP.MPZ.realloc2(xsize * Sys.WORD_SIZE)
result.size = xsize
unsafe_copyto!(result.d, convert(Ptr{T}, pointer(x)), xsize)
return result
end
``````

Noted for future compatibility “issues”, I’ll keep an eye on it

Good to know!

That would come handy, especially if the GMP API beaks at some point Are there in-place operations in `libgmp` and are they exposed to julia somewhere?

Yes, in the `Base.GMP.MPZ` module. Not all functions are exposed, you have to lookup that source file to know which ones are. In-place operations end with the `!` suffix as is usual in Julia.

@roshii Also, I didn’t think about that earlier, but you probably should use `Base.GMP.BITS_PER_LIMB` and `Base.GMP.Limb` to refer to the architecture-dependant number of bits per “limb” and “limb” type, it may not correspond to `Sys.WORD_SIZE` on some architectures (Windows?).

1 Like

Lovely, thank again I assume the case for small endians would look as follows.

``````...
else
n_bytes_without_zeros = 1
for i in 1:n_bytes_with_zeros
n_bytes_without_zeros = i
break
end
end

result = Array{UInt8}(undef, n_bytes_without_zeros)

for i in 1:n_bytes_without_zeros
end
end
return result
end
``````

Windows…
Point taken, code looks even cleaner with it.

``````function bytes2big(x::Array{UInt8,1})
xsize = cld(length(x), Base.GMP.BITS_PER_LIMB / 8)
if ENDIAN_BOM == 0x04030201
reverse!(x)
end
result = Base.GMP.MPZ.realloc2(xsize * Base.GMP.BITS_PER_LIMB)
result.size = xsize
unsafe_copyto!(result.d, convert(Ptr{Base.GMP.Limb}, pointer(x)), xsize)
return result
end
``````

If you decide to go with the `realloc2` approach, check the function in `Base` for error checking and thrown errors - libgmp will just signal abort if the new allocated size is too big, killing the calling process, so make sure that doesn’t happen! I don’t know off the top of my head if that particular function checks for problems before calling into libgmp.

I indeed notice some strange behavior:

``````julia> x = [rand(UInt8)]
1-element Array{UInt8,1}:
0x7d

julia> for i in 1:10
println(bytes2big(x))
end
125
125
125
125
125
125
125
125
125
125

julia> for i in 1:10
println(bytes2big(int2bytes(125)))
end
139767316545661
139767292756093
18446744073709486205
139767289413757
139767316545661
139767312023677
139766825746557
125
139767316545661
139767292756093
``````

I assume I’d need to free pointer reference at some point in time but I’m just struggling here… Would anyone have a clue on what is causing result to vary like that?
How can I ensure a stable result?

Experiencing same randomness with the following:

``````julia> for i in 1:10
println(bytes2big([0x02]))
end
139654643580930
18446744073709486082
139654643580930
2
18446744073709486082
139654697844738
2
18446744073709486082
139654697844738
139654643580930
``````

Some time ago I started something like this, but it turned out to require more work than I was willing to put into it:

Were you able to fix the randomness issue? I observe the same, and I do not see any error messages that might explain what is happening.

I got something similar to the requirement discussed here to work (using `UInt64` instead of `UInt8`):

``````function ubig2ints(x::BigInt)
if Base.GMP.Limb!=UInt64 || Sys.WORD_SIZE!=64 || ENDIAN_BOM!=0x04030201 || x.size<=0
error("not implemented")
end
copy(unsafe_wrap(Array, x.d, x.size))
end

function ints2ubig(x::AbstractVector{UInt64})
if Base.GMP.Limb!=UInt64 || Sys.WORD_SIZE!=64 || ENDIAN_BOM!=0x04030201
error("not implemented")
end
r = Base.GMP.MPZ.realloc2(length(x)*64)
r.size = length(x)
unsafe_copyto!(r.d, pointer(x), length(x))
r
end
``````

We came up with an alternative implementation that doesn’t show this randomness issue.

``````function to_int(x::Vector{UInt8})
if isempty(x)
return 0
end
length(x) > div(Sys.WORD_SIZE, 8) ? T = BigInt : T = Int
result = zero(T)
if ENDIAN_BOM == 0x01020304
reverse!(x)
end
for c in x
result <<= 8
result += c
end
return result
end
``````

However the above appeared to be less performant (for 32 bytes messages) that the below, which I retained and implemented in BitConverter.jl

``````function to_big(x::Vector{UInt8})
hex = bytes2hex(x)
return parse(BigInt, hex, base=16)
end
``````
2 Likes