Packing and unpacking binary data

tamasgal · January 22, 2017, 10:46am

I am trying to port my stuff from Python to Julia and struggling with a piece of code which implements a basic network communication protocol.

Update: The original problem is solved and turned out to be a typo. Feel free to skip to my next question

tamasgal · January 22, 2017, 11:39am

In Python, I heavily use the struct.pack and struct.unpack stuff, which is basically the way to encode and decode binary data. What is the actual way of parsing bytes in Julia?

Here is an example:

In [19]: struct.pack('>ii', 23, 42)
Out[19]: b'\x00\x00\x00\x17\x00\x00\x00*'
In [20]: struct.unpack('>ii', b'\x00\x00\x00\x17\x00\x00\x00*')
Out[20]: (23, 42)

Now in Julia, if I execute read(s, 16) to receive the network packet header of size 16, I get an array of UInt8:

16-element Array{UInt8,1}:
 0x66
 0x6f
 0x6f
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x04
 0x00
 0x00
 0x00
 0x00```

So what would be the standard way of parsing the data according to a given structure? Should I write an immutable type which constructs itself from a given array or is there something already in the standard library which is made for these kind of operations?

Tamas_Papp · January 22, 2017, 1:06pm

Sorry I am not familiar with Python, so I am not sure I understand the problem, but check serialize and deserialize in Base. Neither requires that you specify the type, it will be encoded in the stream; however, the type should be defined in the process you deserialize.

tamasgal · January 22, 2017, 1:25pm

Szia,

thanks for the quick reply. I already looked at [de]serialze but it seems that I cannot specify the structure manually. I need that since I am talking to processes written in different languages (all using the same custom protocol).

So what I basically mean is, if there is an example binary data structure like:

foo [4byte integer], bar [8byte float64], baz [4 byte integer]

which is for example this as hex string (foo=23, bar=3.14, baz=42):

'\x00\x00\x00\x17@H\xf5\xc3\x00\x00\x00*'

This can be easily unpacked via the struct module in the Python standard library, where a simple tuple is returned:

In [22]: struct.unpack('>idi', b'\x00\x00\x00\x17@\t\x1e\xb8Q\xeb\x85\x1f\x00\x00\x00*')
Out[22]: (23, 3.14, 42)

In Julia I’d define a type/immutable like

immutable Whatever
    foo::Int32
    bar::Float64
    baz::Int32
end

and then my question is, how to create an instance if I use the data actually returned by the read() function:

julia> raw_data
16-element Array{UInt8,1}:
 0x00
 0x00
 0x00
 0x17
 .
 .
 .
 0x00
 0x00

since I need to know the size of each attribute of Whatever.

Should I write a specific constructor for Whatever, or a helper function which iterates through the Whatever-attributes, determining the size etc?

ScottPJones · January 22, 2017, 1:42pm

I’d actually strongly against recommend against using the Base.serialize / Base.deserialize functions, if you are doing anything that needs to persist data, as the format is not documented, and is not guaranteed to change incompatibly between Julia versions.

tamasgal · January 22, 2017, 1:48pm

OK, any other suggestions then? I am currently quite confused how to go from an array of UInt8 (this is what I get when I read from the socket stream) to a Whatever-object (I actually use to do the calculations) and then to an actual string representation (which I need to send back via the network socket).

tamasgal · January 22, 2017, 2:23pm

Converting the UInt8 array to a string is done by String(). I tried string() before but that was not the right one.

So now I am playing around with reinterpret but I need a way to deal with different endianness. So I guess I have to reverse the array if needed

julia> a[9:12]
4-element Array{UInt8,1}:
 0x00
 0x00
 0x00
 0x04

julia> reinterpret(Int32, a[9:12])
1-element Array{Int32,1}:
 67108864

julia> reinterpret(Int32, reverse(a[9:12]))
1-element Array{Int32,1}:
 4

tamasgal · January 22, 2017, 2:42pm

This is what I came up with. It is quite an ugly implementation but it works for now. It would be great if someone could point me to the right direction how to do this more elegantly, like using sizeof() for automatically derive the data positions etc.

julia> raw_data = read(s, 16)
16-element Array{UInt8,1}:
 0x66
 0x6f
 0x6f
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x04
 0x00
 0x00
 0x00
 0x00

The corresponding data representation in Julia:

immutable CHPrefix
    tag::String
    length::Int32
    
    function CHPrefix(data::Array{UInt8,1})
        tag = String(data[1:8])
        length = reinterpret(Int32, reverse(data[9:12]))[1]
        new(tag, length)
    end
end

And this is how it works now (still need to find out how to strip the \0 from the String but that should be trivial):

julia> CHPrefix(raw_data)
CHPrefix("foo\0\0\0\0\0",4)

ScottPJones · January 22, 2017, 3:36pm

There is a bswap function in Julia, that you can use on all the elements of the vector once you read it in.
If you are using v0.6, then it becomes beautifully fast and simple to do in-place:

julia> a = UInt16[1,2,3,4,5]
5-element Array{UInt16,1}:
 0x0001
 0x0002
 0x0003
 0x0004
 0x0005

julia> a .= bswap.(a)
5-element Array{UInt16,1}:
 0x0100
 0x0200
 0x0300
 0x0400
 0x0500

ScottPJones · January 22, 2017, 3:39pm

Note: you can also put the input string into an IOBuffer, or directly read the parts from the file, and read the types with:
read(io, type), so for the Int32, it would be read(io, Int32), and if you know it is in reversed order, then bswap(read(io,Int32))

tamasgal · January 22, 2017, 3:43pm

Ah, that’s already very useful, thanks!

ihnorton · January 22, 2017, 4:06pm

See also:
https://github.com/pao/StrPack.jl

and

https://github.com/tanmaykm/ProtoBuf.jl

tamasgal · January 22, 2017, 4:12pm

Thanks. Protobuf is not an option for me but I will have a look at StrPack, although I hope I can stick with the standard library.

ScottPJones · January 22, 2017, 4:58pm

Yes, it’s pretty trivial! (Julia’s great that way )
rstrip(str, '\0') will remove any trailing nul bytes.

ihnorton · January 22, 2017, 5:52pm

Here’s a minimal example (given your definitions above)

julia> a = b"\x00\x00\x00\x17@\t\x1e\xb8Q\xeb\x85\x1f\x00\x00\x00*"
julia> let buf=IOBuffer(a); Whatever(hton(read(buf,Int32)), hton(read(buf,Float64)), hton(read(buf,Int32))) end
Whatever(23,3.14,42)

As Scott pointed out, you can use read(buf,T,...) to read “arbitrary bytes” into “Julia objects”. But one issue to be aware of (aside from byte order) is that Julia uses C layout rules. So your struct definition:

immutable Whatever
    foo::Int32
    bar::Float64
    baz::Int32
end

is actually laid out in memory [8 bytes,8 bytes, 8 bytes], and the total size is 24. (per the rules, if you did [::Int32, ::Int32, ::Float64] instead, then you would get a 16-byte struct). StrPack provides tooling for working with more flexible layouts.

ScottPJones · January 22, 2017, 6:06pm

That should actually be ntoh instead of hton above. (since he is going from network format (i.e. big-endian) to host format (which is I think currently always little-endian for Julia [even on POWER platforms]).
Good catch about the C layout rules, and that would be different also on a 32-bit platform, you’d have 16 bytes instead of 24 for the Whatever structure) (since there’d be no padding)

dpsanders · January 22, 2017, 6:10pm

Please mark one of the relevant answers as the solution using the “check” (or “tick”) button/icon that should appear after clicking the “…”.

tamasgal · January 22, 2017, 6:10pm

Great, that makes sense now, thanks. It seems I will have to study StrPack, since the layouts of the binary formats I am dealing with often contains also paddings and also mixed endianness. I don’t want to build my package on sandy grounds

tamasgal · January 22, 2017, 6:13pm

Kind of hard to mark a solution as there are many useful inputs…

Topic		Replies	Views
Need a function similar to python struct.unpack General Usage question	12	714	July 3, 2023
Unpacking binary data into a Julia struct General Usage question , binaryio	7	2571	November 25, 2020
Unpacking semi-structured binary stream General Usage question	4	633	July 5, 2021
Reading structs from binary stream General Usage binaryio	1	481	January 28, 2020
Packed struct Performance	2	1180	January 16, 2020

Packing and unpacking binary data

Related topics