Converting UInt8 Array to BigInt, and back

roshii · December 30, 2018, 8:19am

I am writing functions that would convert Array{UInt8,1} to BigInt and vice versa.

From BigInt to Array I use to do as follows, through an hexadecimal string.

function int2bytes(x::Integer)
    hex = string(x, base=16)
    if mod(length(hex), 2) != 0
        hex = string("0", hex)
    end
    return hex2bytes(hex)
end

@gdkrmr suggested the following faster implementation which I am now using.

function int2bytes(x::BigInt)
    result = Array{UInt8}(undef, x.size * sizeof(eltype(x.d)))
    unsafe_copyto!(convert(Ptr{eltype(x.d)}, pointer(result)), x.d, x.size)
    if ENDIAN_BOM == 0x04030201
        result = result[end:-1:1]
    end
    i = findfirst(x -> x != 0x00, result)
    result[i:end]
end

From Array to BigInt, I am also going through hexadecimal string at the moment.

function bytes2int(x::Array{UInt8,1})
    hex = bytes2hex(x)
    return parse(BigInt, hex, base=16)
end

Based on @gdkrmr suggestion I am now trying to implement this conversion without string but can’t make it work… I came up with the below code which works when broken down in REPL but crashes when running it in a function.

function bytes2int_new(x::Array{UInt8,1})
    l = length(x)
    if Sys.WORD_SIZE == 64
        T = UInt64
        xsize = cld(l, 8)
    elseif Sys.WORD_SIZE == 32
        T = UInt32
        xsize = cld(l, 4)
    end
    x = cat(fill(UInt8(0), xsize * 8 - l), x; dims=1)
    if ENDIAN_BOM == 0x04030201
        reverse!(x)
    end
    if xsize == 1
        return reinterpret(T,x)
    else
        result = big(0)
        unsafe_copyto!(result.d, convert(Ptr{T}, pointer(x)), xsize)
        result.size = xsize
        result.alloc = xsize + 1
        return result
    end
end

Clearly, there is something I do not understand about pointers and BigInt implementation…

First of all, is it at all possible to create a BigInt by assigning value to its fields?
How to properly assign those value?
What does the alloc field represent?

I may also be on a wrong path for this function, so any suggestions to avoid string representation is welcome for this function

Per · December 30, 2018, 8:41am

If x is a vector of UInt8, then the preferred way to convert it to a vector of BigInt should be BigInt.(x) (I guess this ends up calling some C-function in GMP, probably after converting from UInt8 to Int.)

Is that not fast enough?

Sukera · December 30, 2018, 8:43am

I think the question is about converting an array of UInt8 to a single BigInt, not to an array of BigInts.

Per · December 30, 2018, 8:44am

Oh. I see. Carry on.

rfourquet · December 30, 2018, 9:05am

Yes, but proper initialization must be taken care of.
The problem in your implementation is likely that you are writing with unsafe_copyto! in memory regions which have not be allocated.
To create a BigInt which can represent a number with n bits, you can use a = Base.GMP.MPZ.realloc2(n). Then you don’t need to touch the .alloc field of a, and you can write n bits at a.d (or cld(n, 64) UInt64 words if the architecture is 64 bits); you also need to manually update the .size field of a.

Note also that this is internal API and will break eventually.

gdkrmr · December 30, 2018, 10:17am

This one is even faster:

function int2bytes3(x::BigInt)
    n_bytes_with_zeros = x.size * sizeof(Sys.WORD_SIZE)
    uint8_ptr = convert(Ptr{UInt8}, x.d)

    if ENDIAN_BOM == 0x04030201
        # the minimum should be 1, else the result array will be of
        # length 0
        n_bytes_without_zeros = 1
        for i in n_bytes_with_zeros:-1:1
            if unsafe_load(uint8_ptr, i) != 0x00
                n_bytes_without_zeros = i
                break
            end
        end

        result = Array{UInt8}(undef, n_bytes_without_zeros)

        for i in 1:n_bytes_without_zeros
            @inbounds result[n_bytes_without_zeros + 1 - i] = unsafe_load(uint8_ptr, i)
        end

        return result
    else
        error("not implemented... yet")
    end

end

Sukera · December 30, 2018, 1:07pm

If this is calling the gmp library function I think it is calling, beware with the size, as n is limited by the library to INT_MAX. This caught me off guard a while back trying to compute A(4,3) of a two argument version of the Ackermann function… So much for “arbitrary size”

I have plans to create a (hopefully) pure Julia arbitrary size package, but it’s not even in it’s infancy…

roshii · December 30, 2018, 1:55pm

Thanks, it is working as a charm, 3 order of magnitude faster than the original code
I did remove the step where it was filling Array with leading zero, proven useless and resources hungry.

Final code for reference:

function bytes2big(x::Array{UInt8,1})
    l = length(x)
    if Sys.WORD_SIZE == 64
        T = UInt64
        xsize = cld(l, 8)
    elseif Sys.WORD_SIZE == 32
        T = UInt32
        xsize = cld(l, 4)
    end
    if ENDIAN_BOM == 0x04030201
        reverse!(x)
    end
    result = Base.GMP.MPZ.realloc2(xsize * Sys.WORD_SIZE)
    result.size = xsize
    unsafe_copyto!(result.d, convert(Ptr{T}, pointer(x)), xsize)
    return result
end

Noted for future compatibility “issues”, I’ll keep an eye on it

roshii · December 30, 2018, 1:58pm

Good to know!

That would come handy, especially if the GMP API beaks at some point

gdkrmr · December 30, 2018, 2:08pm

Are there in-place operations in libgmp and are they exposed to julia somewhere?

rfourquet · December 30, 2018, 2:23pm

Yes, in the Base.GMP.MPZ module. Not all functions are exposed, you have to lookup that source file to know which ones are. In-place operations end with the ! suffix as is usual in Julia.

@roshii Also, I didn’t think about that earlier, but you probably should use Base.GMP.BITS_PER_LIMB and Base.GMP.Limb to refer to the architecture-dependant number of bits per “limb” and “limb” type, it may not correspond to Sys.WORD_SIZE on some architectures (Windows?).

roshii · December 30, 2018, 5:11pm

Lovely, thank again
I assume the case for small endians would look as follows.

...
    else
        n_bytes_without_zeros = 1
        for i in 1:n_bytes_with_zeros
            if unsafe_load(uint8_ptr, i) != 0x00
                n_bytes_without_zeros = i
                break
            end
        end

        result = Array{UInt8}(undef, n_bytes_without_zeros)

        for i in 1:n_bytes_without_zeros
            @inbounds result[i] = unsafe_load(uint8_ptr, i)
        end
    end
    return result
end

roshii · December 30, 2018, 5:14pm

Windows…
Point taken, code looks even cleaner with it.

function bytes2big(x::Array{UInt8,1})
    xsize = cld(length(x), Base.GMP.BITS_PER_LIMB / 8)
    if ENDIAN_BOM == 0x04030201
        reverse!(x)
    end
    result = Base.GMP.MPZ.realloc2(xsize * Base.GMP.BITS_PER_LIMB)
    result.size = xsize
    unsafe_copyto!(result.d, convert(Ptr{Base.GMP.Limb}, pointer(x)), xsize)
    return result
end

Sukera · December 30, 2018, 6:03pm

If you decide to go with the realloc2 approach, check the function in Base for error checking and thrown errors - libgmp will just signal abort if the new allocated size is too big, killing the calling process, so make sure that doesn’t happen! I don’t know off the top of my head if that particular function checks for problems before calling into libgmp.

roshii · January 2, 2019, 9:57am

I indeed notice some strange behavior:

julia> x = [rand(UInt8)]
1-element Array{UInt8,1}:
 0x7d

julia> for i in 1:10
       println(bytes2big(x))
       end
125
125
125
125
125
125
125
125
125
125

julia> for i in 1:10
       println(bytes2big(int2bytes(125)))
       end
139767316545661
139767292756093
18446744073709486205
139767289413757
139767316545661
139767312023677
139766825746557
125
139767316545661
139767292756093

I assume I’d need to free pointer reference at some point in time but I’m just struggling here… Would anyone have a clue on what is causing result to vary like that?
How can I ensure a stable result?

roshii · January 2, 2019, 10:50am

Experiencing same randomness with the following:

julia> for i in 1:10
              println(bytes2big([0x02]))
              end
139654643580930
18446744073709486082
139654643580930
2
18446744073709486082
139654697844738
2
18446744073709486082
139654697844738
139654643580930

gdkrmr · January 2, 2019, 1:28pm

Some time ago I started something like this, but it turned out to require more work than I was willing to put into it:

Krastanov · July 10, 2019, 8:25pm

Were you able to fix the randomness issue? I observe the same, and I do not see any error messages that might explain what is happening.

Krastanov · July 11, 2019, 2:14am

I got something similar to the requirement discussed here to work (using UInt64 instead of UInt8):

function ubig2ints(x::BigInt)
    if Base.GMP.Limb!=UInt64 || Sys.WORD_SIZE!=64 || ENDIAN_BOM!=0x04030201 || x.size<=0
        error("not implemented")
    end
    copy(unsafe_wrap(Array, x.d, x.size))
end

function ints2ubig(x::AbstractVector{UInt64})
    if Base.GMP.Limb!=UInt64 || Sys.WORD_SIZE!=64 || ENDIAN_BOM!=0x04030201
        error("not implemented")
    end
    r = Base.GMP.MPZ.realloc2(length(x)*64)
    r.size = length(x)
    unsafe_copyto!(r.d, pointer(x), length(x))
    r
end

roshii · July 12, 2019, 12:17pm

We came up with an alternative implementation that doesn’t show this randomness issue.

function to_int(x::Vector{UInt8})
    if isempty(x)
        return 0
    end
    length(x) > div(Sys.WORD_SIZE, 8) ? T = BigInt : T = Int
    result = zero(T)
    if ENDIAN_BOM == 0x01020304
        reverse!(x)
    end
    for c in x
        result <<= 8
        result += c
    end
    return result
end

However the above appeared to be less performant (for 32 bytes messages) that the below, which I retained and implemented in BitConverter.jl

function to_big(x::Vector{UInt8})
    hex = bytes2hex(x)
    return parse(BigInt, hex, base=16)
end

Topic		Replies	Views
Need for a Method hex2bytes(Vector{UInt8}) General Usage	5	1569	August 7, 2017
Convert a number to a byte array New to Julia	20	1053	March 20, 2024
BigInt to hex and back to Bigint General Usage	4	5286	May 23, 2017
How to get an Array{UInt8,1} from Unsigned without passing by String? New to Julia	4	946	December 8, 2018
BigInt to bytes General Usage	6	987	December 1, 2022

Converting UInt8 Array to BigInt, and back

Related topics