Reinterpret packed data which is already in memory?

I have a quite trivial question. I have read large arrays of bytes from different source into a single array (sometimes 2e9 bytes and more) and now I need to make it pretty for the user and thought utilising reinterpret using the corresponding struct, which is basically a nop.

My problem is: packed data. The structures I deal with are completely mixed and their on-disk representation is packed. I looked into some older packages (StrPack.jl, StructIO.jl, etc.) which I also use for other projects but could not find a way to reinterpret the data without any copy or new allocation.

A dummy example:

data = [0x01, 0x00, 0x10, 0x02, 0x00, 0x20]

struct Foo
    x::UInt8
    y::UInt16
end

reinterpret(Foo, data)

gives of course

ArgumentError: cannot reinterpret an `UInt8` array to `Foo` whose first dimension has size `6`.
The resulting array would have non-integral first dimension.

Since the memory representation of this structure is unpacked and includes padding.

It would work if I had the padding bytes (I marked them with 0xFF) which I obviously don’t have :wink:

data = [0x01, 0xFF, 0x00, 0x10, 0x02, 0xFF, 0x00, 0x20]
​
reinterpret(Foo, data)
2-element reinterpret(Foo, ::Array{UInt8,1}):
 Foo(0x01, 0x1000)
 Foo(0x02, 0x2000)

Am I hitting the wall here or is there some (dirty) way to create an actual structure in Julia which I can map over my in-memory data?

Btw. I also tried to unpack the data already while reading it from disk but 1. it’s a bit tricky since data can be split and 2. I hit huge drops in performance when doing so (100x and more). Let alone it’s quite ugly…

You can use primitive types.

# define a type that acts like a "packed" 24-bit struct of x::UInt8, y::UInt16:
primitive type Foo 24 end

function Base.getproperty(foo::Foo, s::Symbol)
    # create a Ref so that we can get a pointer to the raw bytes;
    # I'm not sure if there is a nicer way (unless the primitive type has
    # the right size to be be reinterpreted as a UInt64 or similar)
    r = Ref(foo)
    GC.@preserve r begin
        if s === :x
            return unsafe_load(Ptr{UInt8}(Base.unsafe_convert(Ptr{Cvoid}, r)))
        elseif s === :y
            return unsafe_load(Ptr{UInt16}(Base.unsafe_convert(Ptr{Cvoid}, r)+1))
        end
    end
    error("unknown field $s")
end
Base.show(io::IO, foo::Foo) = print(io, "Foo(", foo.x, ',', foo.y, ')')

after which you can do:

julia> foodata = reinterpret(Foo, [0x01, 0x00, 0x10, 0x02, 0x00, 0x20])
2-element reinterpret(Foo, ::Array{UInt8,1}):
 Foo(1,4096)
 Foo(2,8192)

julia> foodata[2].y
0x2000
5 Likes

Ah that’s really neat, I have to try it!

The only problem is that I have quite a few fields, so the if-else-block will be huge, but that can be solved with a macro or something like this…

After playing around with it, it works really nicely and is fast :slight_smile: thanks

I now have to figure out how to write macros to create the functions during runtime…