Efficient way to go from bit-packed value to `Vector{T<:Unsigned}`

I am trying to write a faster Parquet reader (faster than the one in Parquet.jl atm).

One interesting problem that I encountered is this: How to efficiently unpack a set of bit-packed integers into Vector{T<:Unsigned} for some T?

The data is stored in a bit-packed way explained here (see point 1).

See exerpt

So basically given a Vector{UInt8} of bit-packed values. How do I convert it back Vector{T<:Unsigned}

For example, if

bit_packed = UInt8[161]

and say the bitwidth is 1, i.e every bit represents a value.

The binary form of 161 is 10100001, so it should unpack to

UInt8[1, 0, 0, 0, 0, 1, 0, 1]

For this case, I think digits works very well, e.g.

reverse(digits(UInt8(161), base = 2, pad = 8))

For bitwidth greater than 1, I can do bitshift and then do a similar trick.

E.g. consider the pictured example where bitwidth = 3 and I have bit-packed

unpacked = UInt8[0, 1, 2, 3, 4, ,5, 6, 7]

which gets stored packed as

UInt8[136, 198, 250]
nth_values(bitpacked, n, bitwidth) = begin
    result = UInt(0)
    from_bits = (n - 1)*bitwidth + 1 
    to_bits = n*bitwidth

    from_byte = ceil(Int, from_bits /  8)
    to_byte = div(to_bits - 1, 8)
    to_byte = to_byte + 1

   for byte in @view bitpacked[to_byte:-1:from_byte]
      result = result << 8
      result = result | byte
   result = result >> (mod(from_bits-1, 8))
   result & UInt(2^bitwidth - 1)
nth_values.(Ref(bitpacked), 1:8, 3)

which yields

UInt8[0, 1, 2, 3, 4, 5, 6, ,7]

but I feel it’s a bit unsatisfactory and may be quite slow. I wonder if there are well known algorithm for faster unpacking?

Do you actually need to unpack? It seems like one approach would be to define a type of 3 bit integer arrays.

is that even possible? How then do I sort it etc?

What about something like this approach: Reinterpret packed data which is already in memory? ?

1 Like

Feels like it only works if the data is a mulitple of 8 bits. The bitpacked values can be any number and not necessarily a multiple of 8.

Another problem is that the bitwidth is not a factor of 8 and may not be a multiple of 8 so one value can be stored across two bytes.

Just break the data into chunks of the least-common multiple lcm(8, bitwidth) ÷ 8 bytes, and then use bit-shift/mask operations.

For example, in your 3-bit example, I would loop through the data in groups of 3 bytes, and pull out 8 numbers from each 3-byte block.

1 Like

Thanks for the suggestion. I was thinking something along the same line, but the bitwidth can be as large as 32. So say my bitwidth is 17 so I can store it in a primitive type like this

primitive type Foo lcm(17, 8) end

and lcm(8, 17) happens to be 136. But there is no bitshift operation defined on it and there is also no &


bitpicked = UInt8[1:8...]
bitpacked_val = unsafe_load(Ptr{UInt}(pointer(bitpicked)))
bitpacked_val & UInt(2^17-1)

works but

bitpacked_val = unsafe_load(Ptr{Foo}(pointer(bitpicked)))
bitpacked_val & UInt(2^17-1)

doesn’t work.

Is there a way define shifts and mask for these primitive types?

That would require defining a bitshift operation on 24bits or I need to Base.zext_int to a 32 bit integer before I can do the operations. But seems like I am out of luck for bitwidth where lcm(bitwidth, 8) > 128.

One thing I can do is to treat those as special and anything smaller, I can just Base.zext_int into the next size for pulling out numbers.

I hardly know anything about this low level programming, but would this package be helpful?https://github.com/RelationalAI-oss/Blobs.jl

1 Like

Thanks. I thinkit falls into the same category as previous suggestions in that it doesn’t to be able to deal with high bitwidth structures.