Hi,
I’m trying to create a Julia reader for a binary block-based file format and run into different issues depending on which approach I try. There are many different types of blocks, but essentially they look something like this:
using CBinding
import Base.read
testdata = IOBuffer([0x23, 0x23, 0x4d, 0x44, 0x00, 0x00, 0x00, 0x00,
0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f,
0x72, 0x6c, 0x64, 0x00])
@cstruct HEADER {
id::UInt8[4]
reserved::UInt8[4]
length::UInt64
link_count::UInt64
};
@cstruct MDBLOCK {
header::HEADER
# No links...
md_data::UInt8[]
};
function read(io::IO, MDBLOCK)
header = read(io, HEADER)
content = read(io, header.length)
println("READ:" * String(content))
MDBLOCK(header, content)
end
function read_md(io::IO)
header = read(io, HEADER)
content = read(io, header.length)
println("READ:" * String(content))
MDBLOCK(header, content)
end
# md = read(testdata, MDBLOCK)
seekstart(testdata)
md2 = read_md(testdata)
The header is always present, and may be followed by a set of links (essentially UInt64s) pointing to other blocks. The payload part may be either a fixed structure OR - as above, in md_data - varying in size. The varying-size payload is where i kind of get stuck and ask for some guidance.
So far I have tried a couple of approaches:
CBindings.jl was where I started, but with the post 1.0 move to pure C syntax I really felt that was going in the wrong direction, essentially re-creating the two-language problem that it is one of Julias goals to solve.
So I tried pinning CBindings.jl to pre-1.0, making it possible to use @cstruct to define my block types, including making them possible to read directly from an IOStream. That works fine for the static payload case where the complete block is specified at compile time. However, I get stuck when I need to overload read(io::IO, MDBLOCK). When calling, the compiler wants to use the auto-generated read method from CBindings, not my new shiny one handling the variable length part
The auto-generated read does - kind of - work, as it reads the header part, but leaved the payload unread). Is there a way to make my own read “more specific” to get it selected at dispatch?
julia> @which read(testdata, MDBLOCK)
read(io::IO, ::Type{CA}) where CA<:Caggregate in CBinding at /Users/klint/.julia/packages/CBinding/9dfDe/src/caggregate.jl:31
Next I tried renaming my read function to read_md and dropping the MDBLOCK type parameter. (Both versions are included in the minimal example) This works as far as reading the variable length payload, but complains that is does not find a matching constructor for MDBLOCK.
julia>
READ:Hello world
ERROR: LoadError: MethodError: no method matching MDBLOCK(::HEADER, ::Vector{UInt8})
Closest candidates are:
(::Type{CA})(::Union{typeof(zero), UndefInitializer, Cconst{CA, S} where S, Caggregate, CA}; kwargs...) where CA<:Caggregate at /Users/klint/.julia/packages/CBinding/9dfDe/src/caggregate.jl:15
Stacktrace:
[1] read_md(io::IOBuffer)
@ Main ~/proj/julia/mdf/minimal.jl:34
[2] top-level scope
@ ~/proj/julia/mdf/minimal.jl:39
in expression starting at /Users/klint/proj/julia/mdf/minimal.jl:39
julia>
So I tried biting the bullet and learning/using StaticArray to define the structs, but unless I am mistaken that does not create readers for the structs I define. With CBindings.jl providing that as-is I really would prefer not to have to repeat the structures by creating read methods explicitly reading each field separately (as was suggested in a post by @c42f here
Am I missing something obvious or am I just asking for too much of Julia? It feels like this should not be too hard a problem.
As in the article linked above I’m learning and would like a “Julian” solution. I of course realise that pinning CBindings to an old version is far from ideal, but from where I stand and what I know that right now looks like the most desirable solution, but that may well be because i dug myself into a hole.
Concrete questions, in some kind of simultaneous decreasing frustration but increasing importance order:
- Is there a way / how to make dispatch pick my read function?
- Does it make sense to trying to build something potentially useful on an old CBinding version?
- Is there a better way forward, eg just accepting a need to duplicate info by reading a field at a time?
Thanks!