I have to deal with a couple of “big endian” structures in raw files and want to parse them into Julia structs.
Here is an MWE, where I have some data (big endian, coming from the network or from a file), a struct and a little function to parse a Vector{UInt8}
into a given type:
data = Vector{UInt8}([0x00, 0x00, 0x00, 0x64, 0x00, 0x00, 0x00, 0x32])
struct Foo
a::Int32
b::Int32
end
function retrieve(::Type{T}, data) where {T}
ref = Ref{T}()
read!(IOBuffer(data), ref)
return ref[]
end
The only problem is of course the endianness:
julia> f = retrieve(Foo, data)
Foo(1677721600, 838860800)
julia> f.a
1677721600
julia> f.b
838860800
julia> ntoh(f.a) # the correct value of a
100
julia> ntoh(f.b) # the correct value of b
50
Now I am not sure how to deal with the big endian conversion ntoh()
effectively, since I am reading a lot of data and there are many different structures to parse.
I though about creating a macro like StrPack
does (@struct
) and use that instead of struct
to create my types and basically read
the fields and convert them but this Ref
and read!()
workflow seems to be much more efficient than reading the data piece by piece. I also thought about using StrPack
itself but it’s currently not working on julia 1.x and also it seems to be an overkill since I only need to deal with big endians, no paddings or other annoying stuff. That would however be a community contribution, which is a bonus of course.
Anyways, to solve the main problem first: is there any clever way to somehow hook into the Ref
+read!
stuff? Of course doing ntoh()
on the data itself is nonsense because those are already octets and also read!()
does not know anything about the structure of T
, it just fills the reference. So it feels like it’s the wrong place to “hack”.
On the other hand, one solution which might be OK is something like read_big_endian!(::Type(T), io, ref)
where I read the exact amount of data from a buffer given the size of the struct and then create some logic to iterate over its fields which are needed to be converted to big endians and swap the bytes of the buffer data in memory (in place) using reverse!()
before actually calling read!()
.
Here is a hardcoded version just for demonstration purposes
function retrieve_big_endian_32(::Type{T}, data) where {T}
ref = Ref{T}()
for idx in range(1; length=Int(length(data)/4), step=4)
reverse!(data, idx, idx+3)
end
read!(IOBuffer(data), ref)
return ref[]
end
julia> data = Vector{UInt8}([0x00, 0x00, 0x00, 0x64, 0x00, 0x00, 0x00, 0x32])
8-element Array{UInt8,1}:
0x00
0x00
0x00
0x64
0x00
0x00
0x00
0x32
julia> retrieve_big_endian_32(Foo, data)
Foo(100, 50)
The big question to the experts is: how does an operation like reverse()
on the raw data compares to ntoh
from the performance point of view? I tried some benchmarks and it seems that ntoh
(which calls bswap
) is more or less a noop
but I need to invest more time on the implementation to compare both approaches. For integers, bswap
is calling bswap_int
which is in base/compiler/tfuncs.jl
and refers to a C function.
I am sorry that I have not invested more time, but I hope that some low level experts might push me into the right direction before I dive into complicated macros or alike