Converting UInt8 Array to BigInt, and back

I am writing functions that would convert Array{UInt8,1} to BigInt and vice versa.

  1. From BigInt to Array I use to do as follows, through an hexadecimal string.
function int2bytes(x::Integer)
    hex = string(x, base=16)
    if mod(length(hex), 2) != 0
        hex = string("0", hex)
    end
    return hex2bytes(hex)
end

@gdkrmr suggested the following faster implementation which I am now using.

function int2bytes(x::BigInt)
    result = Array{UInt8}(undef, x.size * sizeof(eltype(x.d)))
    unsafe_copyto!(convert(Ptr{eltype(x.d)}, pointer(result)), x.d, x.size)
    if ENDIAN_BOM == 0x04030201
        result = result[end:-1:1]
    end
    i = findfirst(x -> x != 0x00, result)
    result[i:end]
end
  1. From Array to BigInt, I am also going through hexadecimal string at the moment.
function bytes2int(x::Array{UInt8,1})
    hex = bytes2hex(x)
    return parse(BigInt, hex, base=16)
end

Based on @gdkrmr suggestion I am now trying to implement this conversion without string but can’t make it work… I came up with the below code which works when broken down in REPL but crashes when running it in a function.

function bytes2int_new(x::Array{UInt8,1})
    l = length(x)
    if Sys.WORD_SIZE == 64
        T = UInt64
        xsize = cld(l, 8)
    elseif Sys.WORD_SIZE == 32
        T = UInt32
        xsize = cld(l, 4)
    end
    x = cat(fill(UInt8(0), xsize * 8 - l), x; dims=1)
    if ENDIAN_BOM == 0x04030201
        reverse!(x)
    end
    if xsize == 1
        return reinterpret(T,x)
    else
        result = big(0)
        unsafe_copyto!(result.d, convert(Ptr{T}, pointer(x)), xsize)
        result.size = xsize
        result.alloc = xsize + 1
        return result
    end
end

Clearly, there is something I do not understand about pointers and BigInt implementation…

First of all, is it at all possible to create a BigInt by assigning value to its fields?
How to properly assign those value?
What does the alloc field represent?

I may also be on a wrong path for this function, so any suggestions to avoid string representation is welcome for this function

If x is a vector of UInt8, then the preferred way to convert it to a vector of BigInt should be BigInt.(x) (I guess this ends up calling some C-function in GMP, probably after converting from UInt8 to Int.)

Is that not fast enough?

I think the question is about converting an array of UInt8 to a single BigInt, not to an array of BigInts.

Oh. I see. Carry on.

Yes, but proper initialization must be taken care of.
The problem in your implementation is likely that you are writing with unsafe_copyto! in memory regions which have not be allocated.
To create a BigInt which can represent a number with n bits, you can use a = Base.GMP.MPZ.realloc2(n). Then you don’t need to touch the .alloc field of a, and you can write n bits at a.d (or cld(n, 64) UInt64 words if the architecture is 64 bits); you also need to manually update the .size field of a.

Note also that this is internal API and will break eventually.

1 Like

This one is even faster:

function int2bytes3(x::BigInt)
    n_bytes_with_zeros = x.size * sizeof(Sys.WORD_SIZE)
    uint8_ptr = convert(Ptr{UInt8}, x.d)

    if ENDIAN_BOM == 0x04030201
        # the minimum should be 1, else the result array will be of
        # length 0
        n_bytes_without_zeros = 1
        for i in n_bytes_with_zeros:-1:1
            if unsafe_load(uint8_ptr, i) != 0x00
                n_bytes_without_zeros = i
                break
            end
        end

        result = Array{UInt8}(undef, n_bytes_without_zeros)

        for i in 1:n_bytes_without_zeros
            @inbounds result[n_bytes_without_zeros + 1 - i] = unsafe_load(uint8_ptr, i)
        end

        return result
    else
        error("not implemented... yet")
    end

end
1 Like

If this is calling the gmp library function I think it is calling, beware with the size, as n is limited by the library to INT_MAX. This caught me off guard a while back trying to compute A(4,3) of a two argument version of the Ackermann function… So much for “arbitrary size” :confused:

I have plans to create a (hopefully) pure Julia arbitrary size package, but it’s not even in it’s infancy…

1 Like

Thanks, it is working as a charm, 3 order of magnitude faster than the original code :slight_smile:
I did remove the step where it was filling Array with leading zero, proven useless and resources hungry.

Final code for reference:

function bytes2big(x::Array{UInt8,1})
    l = length(x)
    if Sys.WORD_SIZE == 64
        T = UInt64
        xsize = cld(l, 8)
    elseif Sys.WORD_SIZE == 32
        T = UInt32
        xsize = cld(l, 4)
    end
    if ENDIAN_BOM == 0x04030201
        reverse!(x)
    end
    result = Base.GMP.MPZ.realloc2(xsize * Sys.WORD_SIZE)
    result.size = xsize
    unsafe_copyto!(result.d, convert(Ptr{T}, pointer(x)), xsize)
    return result
end

Noted for future compatibility “issues”, I’ll keep an eye on it

Good to know!

That would come handy, especially if the GMP API beaks at some point :slight_smile:

Are there in-place operations in libgmp and are they exposed to julia somewhere?

Yes, in the Base.GMP.MPZ module. Not all functions are exposed, you have to lookup that source file to know which ones are. In-place operations end with the ! suffix as is usual in Julia.

@roshii Also, I didn’t think about that earlier, but you probably should use Base.GMP.BITS_PER_LIMB and Base.GMP.Limb to refer to the architecture-dependant number of bits per “limb” and “limb” type, it may not correspond to Sys.WORD_SIZE on some architectures (Windows?).

1 Like

Lovely, thank again :slight_smile:
I assume the case for small endians would look as follows.

...
    else
        n_bytes_without_zeros = 1
        for i in 1:n_bytes_with_zeros
            if unsafe_load(uint8_ptr, i) != 0x00
                n_bytes_without_zeros = i
                break
            end
        end

        result = Array{UInt8}(undef, n_bytes_without_zeros)

        for i in 1:n_bytes_without_zeros
            @inbounds result[i] = unsafe_load(uint8_ptr, i)
        end
    end
    return result
end

Windows…
Point taken, code looks even cleaner with it.

function bytes2big(x::Array{UInt8,1})
    xsize = cld(length(x), Base.GMP.BITS_PER_LIMB / 8)
    if ENDIAN_BOM == 0x04030201
        reverse!(x)
    end
    result = Base.GMP.MPZ.realloc2(xsize * Base.GMP.BITS_PER_LIMB)
    result.size = xsize
    unsafe_copyto!(result.d, convert(Ptr{Base.GMP.Limb}, pointer(x)), xsize)
    return result
end

If you decide to go with the realloc2 approach, check the function in Base for error checking and thrown errors - libgmp will just signal abort if the new allocated size is too big, killing the calling process, so make sure that doesn’t happen! I don’t know off the top of my head if that particular function checks for problems before calling into libgmp.

I indeed notice some strange behavior:

julia> x = [rand(UInt8)]
1-element Array{UInt8,1}:
 0x7d

julia> for i in 1:10
       println(bytes2big(x))
       end
125
125
125
125
125
125
125
125
125
125

julia> for i in 1:10
       println(bytes2big(int2bytes(125)))
       end
139767316545661
139767292756093
18446744073709486205
139767289413757
139767316545661
139767312023677
139766825746557
125
139767316545661
139767292756093

I assume I’d need to free pointer reference at some point in time but I’m just struggling here… Would anyone have a clue on what is causing result to vary like that?
How can I ensure a stable result?

Experiencing same randomness with the following:

julia> for i in 1:10
              println(bytes2big([0x02]))
              end
139654643580930
18446744073709486082
139654643580930
2
18446744073709486082
139654697844738
2
18446744073709486082
139654697844738
139654643580930

Some time ago I started something like this, but it turned out to require more work than I was willing to put into it:

Were you able to fix the randomness issue? I observe the same, and I do not see any error messages that might explain what is happening.

I got something similar to the requirement discussed here to work (using UInt64 instead of UInt8):

function ubig2ints(x::BigInt)
    if Base.GMP.Limb!=UInt64 || Sys.WORD_SIZE!=64 || ENDIAN_BOM!=0x04030201 || x.size<=0
        error("not implemented")
    end
    copy(unsafe_wrap(Array, x.d, x.size))
end

function ints2ubig(x::AbstractVector{UInt64})
    if Base.GMP.Limb!=UInt64 || Sys.WORD_SIZE!=64 || ENDIAN_BOM!=0x04030201
        error("not implemented")
    end
    r = Base.GMP.MPZ.realloc2(length(x)*64)
    r.size = length(x)
    unsafe_copyto!(r.d, pointer(x), length(x))
    r
end

We came up with an alternative implementation that doesn’t show this randomness issue.

function to_int(x::Vector{UInt8})
    if isempty(x)
        return 0
    end
    length(x) > div(Sys.WORD_SIZE, 8) ? T = BigInt : T = Int
    result = zero(T)
    if ENDIAN_BOM == 0x01020304
        reverse!(x)
    end
    for c in x
        result <<= 8
        result += c
    end
    return result
end

However the above appeared to be less performant (for 32 bytes messages) that the below, which I retained and implemented in BitConverter.jl

function to_big(x::Vector{UInt8})
    hex = bytes2hex(x)
    return parse(BigInt, hex, base=16)
end
2 Likes