I am trying to port my stuff from Python to Julia and struggling with a piece of code which implements a basic network communication protocol.
Update: The original problem is solved and turned out to be a typo. Feel free to skip to my next question
I am trying to port my stuff from Python to Julia and struggling with a piece of code which implements a basic network communication protocol.
Update: The original problem is solved and turned out to be a typo. Feel free to skip to my next question
In Python, I heavily use the struct.pack
and struct.unpack
stuff, which is basically the way to encode and decode binary data. What is the actual way of parsing bytes in Julia?
Here is an example:
In [19]: struct.pack('>ii', 23, 42)
Out[19]: b'\x00\x00\x00\x17\x00\x00\x00*'
In [20]: struct.unpack('>ii', b'\x00\x00\x00\x17\x00\x00\x00*')
Out[20]: (23, 42)
Now in Julia, if I execute read(s, 16)
to receive the network packet header of size 16, I get an array of UInt8
:
16-element Array{UInt8,1}:
0x66
0x6f
0x6f
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x04
0x00
0x00
0x00
0x00```
So what would be the standard way of parsing the data according to a given structure? Should I write an immutable type which constructs itself from a given array or is there something already in the standard library which is made for these kind of operations?
Sorry I am not familiar with Python, so I am not sure I understand the problem, but check serialize
and deserialize
in Base
. Neither requires that you specify the type, it will be encoded in the stream; however, the type should be defined in the process you deserialize.
Szia,
thanks for the quick reply. I already looked at [de]serialze
but it seems that I cannot specify the structure manually. I need that since I am talking to processes written in different languages (all using the same custom protocol).
So what I basically mean is, if there is an example binary data structure like:
foo [4byte integer], bar [8byte float64], baz [4 byte integer]
which is for example this as hex string (foo=23, bar=3.14, baz=42):
'\x00\x00\x00\x17@H\xf5\xc3\x00\x00\x00*'
This can be easily unpacked via the struct
module in the Python standard library, where a simple tuple is returned:
In [22]: struct.unpack('>idi', b'\x00\x00\x00\x17@\t\x1e\xb8Q\xeb\x85\x1f\x00\x00\x00*')
Out[22]: (23, 3.14, 42)
In Julia I’d define a type/immutable like
immutable Whatever
foo::Int32
bar::Float64
baz::Int32
end
and then my question is, how to create an instance if I use the data actually returned by the read()
function:
julia> raw_data
16-element Array{UInt8,1}:
0x00
0x00
0x00
0x17
.
.
.
0x00
0x00
since I need to know the size of each attribute of Whatever
.
Should I write a specific constructor for Whatever
, or a helper function which iterates through the Whatever
-attributes, determining the size etc?
I’d actually strongly against recommend against using the Base.serialize
/ Base.deserialize
functions, if you are doing anything that needs to persist data, as the format is not documented, and is not guaranteed to change incompatibly between Julia versions.
OK, any other suggestions then? I am currently quite confused how to go from an array of UInt8
(this is what I get when I read from the socket stream) to a Whatever
-object (I actually use to do the calculations) and then to an actual string representation (which I need to send back via the network socket).
Converting the UInt8
array to a string is done by String()
. I tried string()
before but that was not the right one.
So now I am playing around with reinterpret
but I need a way to deal with different endianness. So I guess I have to reverse the array if needed
julia> a[9:12]
4-element Array{UInt8,1}:
0x00
0x00
0x00
0x04
julia> reinterpret(Int32, a[9:12])
1-element Array{Int32,1}:
67108864
julia> reinterpret(Int32, reverse(a[9:12]))
1-element Array{Int32,1}:
4
This is what I came up with. It is quite an ugly implementation but it works for now. It would be great if someone could point me to the right direction how to do this more elegantly, like using sizeof()
for automatically derive the data positions etc.
julia> raw_data = read(s, 16)
16-element Array{UInt8,1}:
0x66
0x6f
0x6f
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x04
0x00
0x00
0x00
0x00
The corresponding data representation in Julia:
immutable CHPrefix
tag::String
length::Int32
function CHPrefix(data::Array{UInt8,1})
tag = String(data[1:8])
length = reinterpret(Int32, reverse(data[9:12]))[1]
new(tag, length)
end
end
And this is how it works now (still need to find out how to strip the \0
from the String
but that should be trivial):
julia> CHPrefix(raw_data)
CHPrefix("foo\0\0\0\0\0",4)
There is a bswap
function in Julia, that you can use on all the elements of the vector once you read it in.
If you are using v0.6, then it becomes beautifully fast and simple to do in-place:
julia> a = UInt16[1,2,3,4,5]
5-element Array{UInt16,1}:
0x0001
0x0002
0x0003
0x0004
0x0005
julia> a .= bswap.(a)
5-element Array{UInt16,1}:
0x0100
0x0200
0x0300
0x0400
0x0500
Note: you can also put the input string into an IOBuffer, or directly read the parts from the file, and read the types with:
read(io, type)
, so for the Int32, it would be read(io, Int32)
, and if you know it is in reversed order, then bswap(read(io,Int32))
Ah, that’s already very useful, thanks!
Thanks. Protobuf is not an option for me but I will have a look at StrPack, although I hope I can stick with the standard library.
Yes, it’s pretty trivial! (Julia’s great that way )
rstrip(str, '\0')
will remove any trailing nul bytes.
Here’s a minimal example (given your definitions above)
julia> a = b"\x00\x00\x00\x17@\t\x1e\xb8Q\xeb\x85\x1f\x00\x00\x00*"
julia> let buf=IOBuffer(a); Whatever(hton(read(buf,Int32)), hton(read(buf,Float64)), hton(read(buf,Int32))) end
Whatever(23,3.14,42)
As Scott pointed out, you can use read(buf,T,...)
to read “arbitrary bytes” into “Julia objects”. But one issue to be aware of (aside from byte order) is that Julia uses C layout rules. So your struct definition:
immutable Whatever
foo::Int32
bar::Float64
baz::Int32
end
is actually laid out in memory [8 bytes,8 bytes, 8 bytes]
, and the total size is 24. (per the rules, if you did [::Int32, ::Int32, ::Float64]
instead, then you would get a 16-byte struct). StrPack provides tooling for working with more flexible layouts.
That should actually be ntoh
instead of hton
above. (since he is going from network format (i.e. big-endian) to host format (which is I think currently always little-endian for Julia [even on POWER platforms]).
Good catch about the C layout rules, and that would be different also on a 32-bit platform, you’d have 16 bytes instead of 24 for the Whatever
structure) (since there’d be no padding)
Please mark one of the relevant answers as the solution using the “check” (or “tick”) button/icon that should appear after clicking the “…”.
Great, that makes sense now, thanks. It seems I will have to study StrPack
, since the layouts of the binary formats I am dealing with often contains also paddings and also mixed endianness. I don’t want to build my package on sandy grounds
Kind of hard to mark a solution as there are many useful inputs…