BigInt to bytes

I’m looking for a way to convert a BigInt number to bytes representation (in python this is done via (int).to_bytes()).

The only solution I came up with is doing BigInt → Hex → Bytes.
For example:

number::BigInt = parse(BigInt, "100001111001101100000100011100110110101110110110101010100001011100110011000110001100101001111111011111100100110011111101001110101011100111000011010011101100100010000011110011011100011001100110101111101011001111111110111011101110101001000100011001110100011011001010", base=2)
hex2bytes(string(number, base=16))

Is there any better approach?
Other than BigInt → bytes is there any way to just perform binary → bytes so that starting with a binary string I can came up with it’s binary representation in one way?

Thanks.

julia> string(number; base=2)
"100001111001101100000100011100110110101110110110101010100001011100110011000110001100101001111111011111100100110011111101001110101011100111000011010011101100100010000011110011011100011001100110101111101011001111111110111011101110101001000100011001110100011011001010"

Do you mean you want the bytes making up that BigInt, so a Vector{UInt8}?

digits(UInt8, number; base=256)

does the job. It is least significant bytes first. You may want to reverse(digits(UInt8, number; base=256)) if the other direction is better.

1 Like

A low-level way to do this is to call the low-level mpz_export function of GMP:

function to_bytes(n::BigInt; bigendian::Bool=true)
    bytes = Vector{UInt8}(undef, (Base.GMP.MPZ.sizeinbase(n, 2) + 7) ÷ 8)
    order = bigendian ? Cint(1) : Cint(-1)
    count = Ref{Csize_t}()
    @ccall "libgmp".__gmpz_export(bytes::Ptr{UInt8}, count::Ref{Csize_t}, order::Cint,
                1::Csize_t, 1::Cint, 0::Csize_t, n::Ref{BigInt})::Ptr{UInt8}
    @assert count[] ≤ length(bytes)
    return resize!(bytes, count[])
end

This is by far the fastest of the methods discussed so far:

julia> using BenchmarkTools

julia> n = factorial(big(999));

julia> @btime to_bytes($n, bigendian=true);
  2.120 μs (1 allocation: 1.14 KiB)

julia> @btime hex2bytes(string($n, base=16));
  5.632 μs (3 allocations: 3.39 KiB)

julia> @btime reverse!(digits(UInt8, $n; base=256));
  667.791 μs (10670 allocations: 1.79 MiB)

julia> to_bytes(n, bigendian=true) == reverse!(digits(UInt8, n; base=256))
true

(The digits function seems terribly slow, probably could be improved?)

Note that the hex2bytes method fails if string(n, base=16) has an odd number of digits (in which case you need to pad with a zero).

5 Likes

Interestingly enough, we already had some discussions of this, resulting in an optimized digits for base ≤ 62 (faster digits(::BigInt) by stevengj · Pull Request #37075 · JuliaLang/julia · GitHub). Seems like we could just add an additional fast path when base == 256 * sizeof(T) ispow2(base) && base-1 ≤ typemax(T).

Update: pull request submitted (faster digits(::BigInt, base=2^n) via mpz_export by stevengj · Pull Request #47774 · JuliaLang/julia · GitHub)

2 Likes

(x+7)÷8 is cld(x,8) :slightly_smiling_face:

1 Like