Bits and bytes manipulation library

Are there any available packages in Julia ecosystem for byte level data manipulation?

Similar packages are scodec from Scala - GitHub - scodec/scodec: Scala combinator library for working with binary data
Elixir has Binaries, strings, and charlists — Elixir v1.17.2

Just in case, would this nice post about using CBinding.jl be helpful?

Can you give some example operations you’re looking for?

take a look at

This is helpful but depending on Clang, is big and unnecessary, I understand why they did it. I need a pure julia low level byte and bit level data manipulation without a lot of dependencies. I might write one if it’s not available :slight_smile:

I basically want this library from GitHub - scodec/scodec: Scala combinator library for working with binary data in julia. It basically has functions for primitive types and combinators to compose them in arbitrary way to create a byte level parser. I use that basically to write network protocols, parse data formats like the post that @rafael.guerra mentioned.

Clojure also has this Introduction · clj-commons/gloss Wiki · GitHub

Are you perhaps thinking of something like the Seriliazation stdlib or something like serde in Rust? As far as I know, there isn’t really much for that kind of work at the moment, the package ecosystem is still more geared towards numerical work than general software engineering stuff.

I guess there’s also some parser combinator libraries, if that’s what you’re looking for. Parsers.jl comes to mind.

1 Like

serde has a lot of functionality, that’s a lofty goal for now. For example, I can say UInt32 :: UInt32 :: UInt32 that should create a parser that can serialize and deserialize bytes 3 Uint32’s little endian for example.

On this note, is there Java’s ByteBuffer like abstraction in Julia?

From just looking at scodec, it looks like you wouldn’t need a library for most functionality…
Julia’s immutable structs (with no mutable structs in them, or isbitstype(T)) are already doing mostly what scodec codecs are doing:

firscodec = Tuple{UInt8,UInt8,UInt16}
bytes = hex2bytes("102a03ff")

# ntoh to swap to endian of the current system
result = (ntoh.(reinterpret(firscodec, bytes)[1]))
Int(sum(result))
struct Point 
    x::Int
    y::Int
    z::Int 
end

# Many ways to convert the result to a Point struct
# Might want some utility, or simply a good coverage of `convert(MyType, x)`

point = Point(result...)
# I guess one could have this convenience:
interpret_as(bytes, ::Type{T}, ::Type{Codec}) where {T, Codec} = convert.(T, reinterpret(Codec, bytes))

io = IOBuffer() # I guess similary to Java ByteBuffer 
write(io, Ref(point))
seekstart(io)
point2 = Ref{Point}()
point2 = read!(io, point2)
@test point2[] == point

io = IOBuffer()
write(io, bswap(0x102a03ff))
bytes = take!(io)
ntoh.(reinterpret(firscodec, bytes)[1]) == result

You can likely make all of this a bit more elegant here and there, but those operations should work pretty well and should be close to C performance in julia.

1 Like

thanks this is very helpful. I am just writing simple reusable components that you can combine to make a byte level parser.

My idea is create codec for every type (using macros and generated), and create a DSL that can compile a parser to a tuple. (I am still thinking about using heterogenous tuples vs structs)

Just a tip from me after developing such code for quite a few years in Julia: I’d keep at as simple and function based as possible before doing anything more complicated.
I’ve regretted any macro and generated function, that I was able to avoid after understanding the actual problem I want to solve better :wink:

3 Likes

thanks for the advice, what’s the best way to emulate HLists (heterogenous lists), tuples are hard to metaprogram.

I am using tuples here, we can also use Vector{Codec}

This is what I got so far

abstract type Codec{T} end

struct IntCodec{T<:Integer} <: Codec{T}
    data::T
end

struct FloatCodec{T<:AbstractFloat} <: Codec{T}
    data::T
end

struct TupleCodec{T<:Tuple{Vararg{Codec}}} <: Codec{T}
    data::T
end

# convert
convert(::Type{T}, codec::IntCodec{T}) where {T<:Integer} = codec.data
convert(::Type{T}, codec::FloatCodec{T}) where {T<:AbstractFloat} = codec.data
convert(::Type{T}, codec::TupleCodec{T}) where {T<:Tuple{Vararg{Codec}}} = codec.data

function Base.read(io::IO, ::Type{T}) where {T<:Tuple{Vararg{Codec}}}
    codec_types = T.parameters
    elements = map(codec_types) do C
        read(io, C)
    end
    return TupleCodec(Tuple(elements))
end

function Base.read(io::IO, ::Type{IntCodec{T}}) where {T<:Integer}
    value = read(io, T)
    return IntCodec{T}(value)
end

# create a codec for every type
@generated function decode(io::IO, ::Type{C}) where {C<:Codec}
    T = C.parameters[1]
    codec_type = C.name.wrapper

    if T <: Integer
        type_name = "integers"
    elseif T <: AbstractFloat
        type_name = "floats"
    elseif T <: Tuple{Vararg{Codec}}
        type_name = "tuple"
    else
        error("Unsupported type for decode: $T")
    end

    quote
        if eof(io) || (io.size - io.ptr + 1) < sizeof($T)
            throw(ArgumentError("Not enough bytes for type $(sizeof($T))-byte $($type_name)"))
        end

        res = read(io, $T)
        return res
    end
end

There is also StructIO.jl for packed binary I/O of heterogeneous data structures.

1 Like