Equivalent of numpy.tobytes and numpy.frombuffer in Julia

Hi! As the topic stated, I once built a Python package that uses numpy.tobytes (source) and numpy.frombuffer (source) when caching things with SQLite.

In Julia, I want to do this with BitArrays instead to save even more space, but cannot find suitable methods to do so. We do have Serialization.serialize and JLD2.jl, but they don’t do quite the same thing. The numpy functions work specifically on arrays and allow us to specify the dtype, eliminating the need of serialization headers. The Julia methods doesn’t provide this convenience.

Are there standard library functions or specific packages I missed that could replace the numpy functions? Or would I have to write the functions myself? Thanks!

If you have an ordinary Array of a bitstype, which is analogous to a numpy array, you can use reinterpret:

julia> a = rand(3)
3-element Vector{Float64}:
 0.3520259358052087
 0.3930310871507644
 0.16629023329978043

julia> reinterpret(UInt8, a)
24-element reinterpret(UInt8, ::Vector{Float64}):
 0x80
 0x23
 0x68
 0xca
 0x97
 0x87
 0xd6
 0x3f
 0x4e
 0xed
 0x67
 0xdc
 0x6b
 0x27
 0xd9
 0x3f
 0xa0
 0x58
 0xd5
 0x94
 0xff
 0x48
 0xc5
 0x3f

But you can also just call write(io, array) to write the raw bytes without calling reinterpret, so I’m a little confused about what you are trying to do.

reinterpret won’t do what you want with a BitArray. The raw storage bytes (which are the bits packed into 64-bit chunks are in somebitarray.chunks:

julia> a = BitArray(rand(Bool, 100));

julia> a.chunks
2-element Vector{UInt64}:
 0x2faeb60b28afdceb
 0x00000006a2d7843a

Aside from serialization headers, the Serialization standard library directly writes the bytes of a.chunks.

You can also directly call write(io, somebitarray) and read!(io, somebitarray), since BitArray defines specialized write and read! methods that write/read the raw chunks. That seems more like what you want?

(Note also that you can use an IOBuffer to write/read to/from an array of bytes rather than a file.)

I knew about reinterpret and BitArray.chunks, but I finally figured that probably the actual problem in my case is that Julia doesn’t seem have a bytes type, which is built into Python (is IOBuffer similar to that?). And I don’t like the idea of storing the results as a BigInt either. To make things clearer, what I wanted to do is storing raw bytes from BitVector.chunks without header (which would be too large since the arrays themselves I’m using wouldn’t be large) as SQLite values, and be able to read that back into a BitVector. JLD2.jl could technically replace SQLite, but the gigantic overhead of metadata is unbearable in my case.

The analogue is just Vector{UInt8} (though this is mutable so it is closer to bytearray in Python). (IOBuffer is a wrapper around this that you can read or write with.)

BigInt is totally the wrong type for storing arbitrary byte sequences.

julia> a = BitVector([true, false, true, true, false, true, true, true]);

julia> buf = IOBuffer();

julia> write(buf, a);

julia> bytes = take!(buf)
8-element Vector{UInt8}:
 0xed
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00

julia> b = BitVector(undef, 8);

julia> read!(IOBuffer(bytes), b);

julia> b
8-element BitVector:
 1
 0
 1
 1
 0
 1
 1
 1

julia> b == a
true

Thank you! I would look into this and SQLite.jl then.