Hi everyone :>
I’ve been using Julia the past few days, and so far I’ve been enjoying it. I’ve been trying to replicate Python functionality as a little exercise in Julia, but now I’m getting stuck on Python’s int.to_bytes function. I tried to read through the CPython source file longobject.c, but I can’t really read too much C yet.
Example:
>>> (259).to_bytes(2, "big")
b'\x01\x03'
>>> (-259).to_bytes(2, "big", signed=True)
b'\xfe\xfd'
>>> (-259).to_bytes(2, "big", signed=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: can't convert negative int to unsigned
I attempted to replicate it as follows, but this feels inelegant. It’s also wrong for any numbers smaller than -128. Are there any built-in functions one could use to make this more readable and maybe faster?
I’m rather worried at that ^ call as well, it could result in weird behavior when it grows too large. I don’t really know how to replicate that behavior with log.
function _signed_to_bytes(int::Integer, numbytes::Integer, byteorder::AbstractString)
maxpositive = (256 ^ numbytes) / 2 - 1
maxnegative = -(maxpositive + 1)
# Bounds check
if !(maxnegative <= int <= maxpositive)
throw(OverflowError("Cannot represent signed int $int with $numbytes bytes"))
end
# Offset negative ints
if int < 0
int = abs(int) + maxpositive
end
# Main calculations
output = zeros(UInt8, numbytes)
i = 1
while int != 0
int, output[i] = divrem(int, 256)
i += 1
end
if byteorder == "big"
reverse!(output)
end
return output
end
julia> @btime _signed_to_bytes(-259, 2, "big")
155.906 ns (9 allocations: 288 bytes)
2-element Array{UInt8,1}:
0x81
0x02
I’ve at least figured out why it was wrong for negative numbers. When the number is negative, I’m supposed to count backwards from the maximum value rather than offset it forward.
Like, 0xff is -1, 0xfe is -2 and so on, not 0x80 is -1 and 0x81 is -2.
My bad.
This will only work on machine integers, not BigInt (which is stored via a pointer to some other data structure, so you won’t get the bytes of the actual numbers).
Thanks for the answer. That’s almost, like really close to what I’m trying to do. However, in Python, I can specify the number of bytes that the number is represented in, and the representation follows. Is there any way to specify that, the number of bytes to represent it with, as of yet?
If not, a little guidance on how to accomplish this would be greatly appreciated :>
Thanks. I think I can get that to work now. Just one more question
How about for values which do not fall in the range of Int8, Int16, …, such as 3 bytes or a sort-of Int24? It’s probably,very probably, useless, but I want to see how accurate I can make it.
Is this representation really the goal, or do you need it as part of a solution to a problem? If the latter, maybe some context would be helpful. Eg if you are trying to serialize/deserialize values, see the standard library Serialization.
Here is a more general replacement for Python’s to_bytes function, mainly as a learning exercise. The trick is to use bit shift (>>) and mask (&) operations:
julia> function to_bytes(n::Integer; bigendian=true, len=sizeof(n))
bytes = Array{UInt8}(undef, len)
for byte in (bigendian ? (1:len) : reverse(1:len))
bytes[byte] = n & 0xff
n >>= 8
end
return bytes
end
to_bytes (generic function with 3 methods)
julia> to_bytes(-28, len=7)
7-element Array{UInt8,1}:
0xe4
0xff
0xff
0xff
0xff
0xff
0xff
julia> to_bytes(2345, len=7)
7-element Array{UInt8,1}:
0x29
0x09
0x00
0x00
0x00
0x00
0x00
julia> to_bytes(2345, len=7, bigendian=false)
7-element Array{UInt8,1}:
0x00
0x00
0x00
0x00
0x00
0x09
0x29
However, it’s not clear to me what this function is actually useful for — Julia has much better alternatives for most things that you might want this for (serialization, bit manipulations, etcetera).
Oh. That’s so clean. That’s what I’m looking for. I really only meant for this as a learning exercise, so it’s about as useful as Python’s own to_bytes function. I’ll look into the alternatives you mentioned for when I need them later on. They seem quite useful. Thanks for all the help!
Correct, write doesn’t have a method for BigInt. That’s because for BigInt you’d have to decide on a format beyond just bytes for the digits, since it has a variable width that you’d need to serialize as well. Either Julia’s native serialize format or some other format of your choice.
By the way, @stevengj, is it okay to incorporate this into the code? I’m planning to release this as an open source package sometime, and I’d like your prior permission to avoid any issues down the line.