Julia equivalent to Python's int.to_bytes

Hi everyone :>
I’ve been using Julia the past few days, and so far I’ve been enjoying it. I’ve been trying to replicate Python functionality as a little exercise in Julia, but now I’m getting stuck on Python’s int.to_bytes function. I tried to read through the CPython source file longobject.c, but I can’t really read too much C yet.

Example:

>>> (259).to_bytes(2, "big")
b'\x01\x03'
>>> (-259).to_bytes(2, "big", signed=True)
b'\xfe\xfd'
>>> (-259).to_bytes(2, "big", signed=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: can't convert negative int to unsigned

I attempted to replicate it as follows, but this feels inelegant. It’s also wrong for any numbers smaller than -128. Are there any built-in functions one could use to make this more readable and maybe faster?

I’m rather worried at that ^ call as well, it could result in weird behavior when it grows too large. I don’t really know how to replicate that behavior with log.

function _signed_to_bytes(int::Integer, numbytes::Integer, byteorder::AbstractString)
    maxpositive = (256 ^ numbytes) / 2 - 1
    maxnegative = -(maxpositive + 1)
    # Bounds check
    if !(maxnegative <= int <= maxpositive)
        throw(OverflowError("Cannot represent signed int $int with $numbytes bytes"))
    end
    # Offset negative ints
    if int < 0
        int = abs(int) + maxpositive
    end
    # Main calculations
    output = zeros(UInt8, numbytes)
    i = 1
    while int != 0
        int, output[i] = divrem(int, 256)
        i += 1
    end
    if byteorder == "big"
        reverse!(output)
    end
    return output
end

julia> @btime _signed_to_bytes(-259, 2, "big")
  155.906 ns (9 allocations: 288 bytes)
2-element Array{UInt8,1}:
 0x81
 0x02

I’ve at least figured out why it was wrong for negative numbers. When the number is negative, I’m supposed to count backwards from the maximum value rather than offset it forward.

Like, 0xff is -1, 0xfe is -2 and so on, not 0x80 is -1 and 0x81 is -2.
My bad.

julia> to_bytes(x) = reinterpret(UInt8, [x])
to_bytes (generic function with 1 method)

julia> to_bytes(3)
8-element reinterpret(UInt8, ::Array{Int64,1}):
 0x03
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00

This uses the native byte order, which is littlendian on most machines these days. If you want bigendian order, call hton:

julia> to_bytes(hton(3))
8-element reinterpret(UInt8, ::Array{Int64,1}):
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x03

This will only work on machine integers, not BigInt (which is stored via a pointer to some other data structure, so you won’t get the bytes of the actual numbers).

You could also use digits:

julia> digits(UInt8, 3, base=256)
1-element Array{UInt8,1}:
 0x03

julia> digits(UInt8, 3, base=256)
1-element Array{UInt8,1}:
 0x03

julia> digits(UInt8, 3, base=256, pad=sizeof(3))
8-element Array{UInt8,1}:
 0x03
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00

but asking for UInt8 bytes via digits doesn’t work for negative values.

4 Likes

Thanks for the answer. That’s almost, like really close to what I’m trying to do. However, in Python, I can specify the number of bytes that the number is represented in, and the representation follows. Is there any way to specify that, the number of bytes to represent it with, as of yet?

If not, a little guidance on how to accomplish this would be greatly appreciated :>

Cast the value to Int8, Int16, Int32, Int64, or Int128 — unlike Python, Julia actually has integer types of various widths.

julia> to_bytes(Int16(3))
2-element reinterpret(UInt8, ::Array{Int16,1}):
 0x03
 0x00

What do you need this function for, anyway?

2 Likes

Thanks. I think I can get that to work now. Just one more question

How about for values which do not fall in the range of Int8, Int16, …, such as 3 bytes or a sort-of Int24? It’s probably,very probably, useless, but I want to see how accurate I can make it.

Is this representation really the goal, or do you need it as part of a solution to a problem? If the latter, maybe some context would be helpful. Eg if you are trying to serialize/deserialize values, see the standard library Serialization.

Or just use write and read.

I don’t think that would work for BigInt.

Here is a more general replacement for Python’s to_bytes function, mainly as a learning exercise. The trick is to use bit shift (>>) and mask (&) operations:

julia> function to_bytes(n::Integer; bigendian=true, len=sizeof(n))
           bytes = Array{UInt8}(undef, len)
           for byte in (bigendian ? (1:len) : reverse(1:len))
               bytes[byte] = n & 0xff
               n >>= 8
           end
           return bytes
       end
to_bytes (generic function with 3 methods)

julia> to_bytes(-28, len=7)
7-element Array{UInt8,1}:
 0xe4
 0xff
 0xff
 0xff
 0xff
 0xff
 0xff

julia> to_bytes(2345, len=7)
7-element Array{UInt8,1}:
 0x29
 0x09
 0x00
 0x00
 0x00
 0x00
 0x00

julia> to_bytes(2345, len=7, bigendian=false)
7-element Array{UInt8,1}:
 0x00
 0x00
 0x00
 0x00
 0x00
 0x09
 0x29

However, it’s not clear to me what this function is actually useful for — Julia has much better alternatives for most things that you might want this for (serialization, bit manipulations, etcetera).

3 Likes

Oh. That’s so clean. That’s what I’m looking for. I really only meant for this as a learning exercise, so it’s about as useful as Python’s own to_bytes function. I’ll look into the alternatives you mentioned for when I need them later on. They seem quite useful. Thanks for all the help!

1 Like

Correct, write doesn’t have a method for BigInt. That’s because for BigInt you’d have to decide on a format beyond just bytes for the digits, since it has a variable width that you’d need to serialize as well. Either Julia’s native serialize format or some other format of your choice.

By the way, @stevengj, is it okay to incorporate this into the code? I’m planning to release this as an open source package sometime, and I’d like your prior permission to avoid any issues down the line.

Yes. All code posted to discourse is automatically MIT-licensed, as described in the Terms of Service: Terms of Service - Julia Programming Language

2 Likes

I needed a similar function for calling crc32(data::Vector{UInt8}), from Libz.jl.