Simple bit manipulation question

I’m converting between two representations of arrays of bits. One is a vector storing 1 bit per element. The function ‘pack_bits’ will pack this into N bits per element. The function ‘unpack_bits’ will convert in the other direction.
I threw together some primitive code for pack_bits, but I’m guessing more elegant solutions are already available?

For example:

``````function pack_bits(inp::Vector{in_t}, bits_per_word::Integer, dir::Integer, out_t=Integer) where {in_t<:Integer}
if bits_per_word > sizeof(out_t) * 8
jlerror("bits_per_word > sizeof(type)*8")
end
if (length(inp) % bits_per_word != 0)
jlerror("pack_bits: size % bits_per_sym != 0")
end
out = Vector{out_t}(Int(length(inp)/bits_per_word))
for o = 1:length(out)
ret = 0
if dir > 0
for b = 0:bits_per_word-1
ret |= (inp[(o-1)*bits_per_word+b+1]&1) << b
end
out[o] = ret
else
for b = 0:bits_per_word-1
ret |= (inp[(o-1)*bits_per_word+b+1]&1) << (bits_per_word - b - 1)
end
out[o] = ret
end
end
return out
end

u = Vector{Int32}([0,0,0,1,1,0,1,1])
Pack_bits.pack_bits(u, 2, -1, Int32)
4-element Array{Int32,1}:
0
1
2
3
``````

You could check out all the wonderful goodness of the builtin `BitArray` type in Base: https://github.com/JuliaLang/julia/blob/master/base/bitarray.jl

3 Likes

Thanks. I did look at that but don’t see how it would help.

Hi. You have a vector that stores 1 bit per element. What is the element type? Do you have the freedom to use elements of type Bool (where each element is 8 bits)? Or, are the elements always a fixed size Int (Int16, or Int32, or Int64 or the alike UInts)? Or do you need process different types of Vector elements? When the element type is known, the packbits part is easier to digest.

I had thought to leave input and output array types generic (vectors of integers of any size), but could change if it offered some advantage.

I assume that your input vector of values, whatever type those values may be, hold either { a zero (for false) or a one (for true)} or hold either { false or true }. Is that correct, or do you envision handling strings like “true” and “false”? – not that it matters.

The requirement that allows handling vectors `Vector{T}` of Bool or Integer types as boolean sentinals is that there is defined `Bool(T)` which accepts an element of type T and returns the corresponding boolean value. `Bool(x::Bool) == x`, `Bool(x::T) where {T<:Union{Signed,Unsigned}} = isodd(x)`. That stuff should work without explicit coding (though Julia defines Bool(-1) as an InexactError).

So, whatever the eltype of your input vector (`eltype(Vector{T}) == T`), you can resolve the input as sequence of Bool values: `Bool.(vec)` broadcasts the Bool constructor over the contents of vec; a Vector{Bool} results. And just for fun, if you wanted to handle strings, `Bool(s::String) = s=="true" ? true : false` should work alright.

Now, the software needs to convert vec::Vector{Bool} into a sequence of packed somethings, for simplicity I’ll assume a sequence of packed bytes (UInt8s) works for you.
The alteration to generate a sequence of packed UInt16s or UInt32s or UInt64s should be clear enough – if not, ask.

There are 8 bits in a byte, so the algorithm needs to gobble 8 Bool values (in their given sequence) at a time to generate a new byte (a `UInt8` value). It is possible that there are not enough elements in the input vec to fill an integral number of bytes. That is potentially a source of error; you should append `falses` as needed to obtain a length of 8*an_integer.

``````function bools_to_bytes(vec::Vector{Bool})
vec = extend_bools(vec)
n = length(vec)
n == 0 && return [zero(UInt8)]
@assert rem(n, 8) == 0
nbytes = div(n, 8)
return bools_to_nbytes(vec, nbytes)
end

function bools_to_nbytes(vec::Vector{Bool}, nbytes::Int)
result = zeros(UInt8, nbytes) # preallocate result space
for byteidx in 1:nbytes
vecstart = 8*(byteidx-1) + 1
boolbits = UInt8.(vec[vecstart:(vecstart+7)])
for i in 1:8
boolbits[i] = boolbits[i] << (8-i)
end
newbyte = reduce(|, boolbits)
push!(result, newbyte)
end
return result[length(result)-nbytes+1:end]
end

function extend_bools(bools::Vector{Bool})
n = length(bools)
rm = rem(n, 8)
while 0 < rm < 8
push!(bools, false)
rm += 1
end
return bools
end
``````

hope this helps