Unpacking semi-structured binary stream

Continuation of:

say we have two similar but slightly different struct:

struct TKey32 <: TKey
    fNbytes::Int32
    fVersion::Int16
    ...
    fSeekKey::Int32
    fSeekPdir::Int32
    fClassName::String
    fName::String
    fTitle::String
end
struct TKey64 <: TKey
    fNbytes::Int32
    fVersion::Int16
    ...
    fSeekKey::Int64
    fSeekPdir::Int64
    fClassName::String
    fName::String
    fTitle::String
end

Now, if all fields are isbits with known length, it would have been trivial. The issue is String fields.

The String is encoded in the stream such that first byte is “length of the Char following”, if length==255, then use the next 4 bytes as UInt32 as length instead, basically, some nontrivial thing.

My question is, is there some existing ~elegant solution to this? I’m also aware of Mixer.jl but I don’t think any of the existing solution would allow the custom “String unpack”

1 Like

How do you receive your data? As a stream of bytes?

What you’re describing is a classical parsing problem. I’d first read all fields of known length and then handle the Strings specially. I’d also parametrize TKey so that fSeekKey etc. have their size as a type parameter instead of having two subtyped versions.

E.g.:

function Base.read(io, ::Type{TKey{T}) where T <: Union{Int64, Int32}
    fNbytes = read(io, Int32)
    fVersion = read(io, Int16)
    ...
    fSeekKey = read(io, T)
    fSeekPdir = read(io, T)
    
    # now read strings specially according to your nontrivial spec
        
    TKey{T}(fNbytes, fVersion, ..., fSeekKey, fSeekPdir, fClassName, fName, fTitle)
end

and then you can just call

read(io, TKey{Int64})

to read a TKey{Int64} from your stream (error handling not included of course).

More specific help would need more information about how Strings are encoded in your stream though.

2 Likes

the issues is kind of that we have a lot of them… so we want a read(io, Struct) but with customization down to field with String. The case where number and name of fields may change half way through would need another tread…

I don’t think you’ll get around specifying how to parse each type, unless you’re willing to have a different function name than parse. If so, you’d have to figure out what determines how any given type can be parsed, extract that information & pass it into those functions and then use that to loop over the field types of your structs.

Sounds like what you’re looking for is a generator for parsing code, maybe something like Kaitai Struct? Sadly doesn’t have a julia backend (though I wanted to write one at some point). I don’t know of any julia parser generators though, sorry :confused:

3 Likes

that’s fine, we have “unpack” for the name, and @tamasgal wrote a macro io so that we call “unpack” on each field so unpack(io, String) will be parsed correctly. The issue is that now we want to make Type declaration to have less line of code, so macro io will also need to handle <: SuperType, but not all @io struct has SuperType… etc.

thanks for the insight though I will give it more thoughts

1 Like