Simple bit manipulation question

I’m converting between two representations of arrays of bits. One is a vector storing 1 bit per element. The function ‘pack_bits’ will pack this into N bits per element. The function ‘unpack_bits’ will convert in the other direction.
I threw together some primitive code for pack_bits, but I’m guessing more elegant solutions are already available?

For example:

function pack_bits(inp::Vector{in_t}, bits_per_word::Integer, dir::Integer, out_t=Integer) where {in_t<:Integer}
    if bits_per_word > sizeof(out_t) * 8
        jlerror("bits_per_word > sizeof(type)*8")
    end
    if (length(inp) % bits_per_word != 0)
        jlerror("pack_bits: size % bits_per_sym != 0")
    end
    out = Vector{out_t}(Int(length(inp)/bits_per_word))
    for o = 1:length(out)
        ret = 0
        if dir > 0
            for b = 0:bits_per_word-1
                ret |= (inp[(o-1)*bits_per_word+b+1]&1) << b
            end
            out[o] = ret
        else
            for b = 0:bits_per_word-1
                ret |= (inp[(o-1)*bits_per_word+b+1]&1) << (bits_per_word - b - 1)
            end
            out[o] = ret
        end
    end
    return out
end

u = Vector{Int32}([0,0,0,1,1,0,1,1])
Pack_bits.pack_bits(u, 2, -1, Int32)
4-element Array{Int32,1}:
 0
 1
 2
 3

You could check out all the wonderful goodness of the builtin BitArray type in Base: https://github.com/JuliaLang/julia/blob/master/base/bitarray.jl

3 Likes

Thanks. I did look at that but don’t see how it would help.

Hi. You have a vector that stores 1 bit per element. What is the element type? Do you have the freedom to use elements of type Bool (where each element is 8 bits)? Or, are the elements always a fixed size Int (Int16, or Int32, or Int64 or the alike UInts)? Or do you need process different types of Vector elements? When the element type is known, the packbits part is easier to digest.

I had thought to leave input and output array types generic (vectors of integers of any size), but could change if it offered some advantage.

I assume that your input vector of values, whatever type those values may be, hold either { a zero (for false) or a one (for true)} or hold either { false or true }. Is that correct, or do you envision handling strings like “true” and “false”? – not that it matters.

The requirement that allows handling vectors Vector{T} of Bool or Integer types as boolean sentinals is that there is defined Bool(T) which accepts an element of type T and returns the corresponding boolean value. Bool(x::Bool) == x, Bool(x::T) where {T<:Union{Signed,Unsigned}} = isodd(x). That stuff should work without explicit coding (though Julia defines Bool(-1) as an InexactError).

So, whatever the eltype of your input vector (eltype(Vector{T}) == T), you can resolve the input as sequence of Bool values: Bool.(vec) broadcasts the Bool constructor over the contents of vec; a Vector{Bool} results. And just for fun, if you wanted to handle strings, Bool(s::String) = s=="true" ? true : false should work alright.

Now, the software needs to convert vec::Vector{Bool} into a sequence of packed somethings, for simplicity I’ll assume a sequence of packed bytes (UInt8s) works for you.
The alteration to generate a sequence of packed UInt16s or UInt32s or UInt64s should be clear enough – if not, ask.

There are 8 bits in a byte, so the algorithm needs to gobble 8 Bool values (in their given sequence) at a time to generate a new byte (a UInt8 value). It is possible that there are not enough elements in the input vec to fill an integral number of bytes. That is potentially a source of error; you should append falses as needed to obtain a length of 8*an_integer.

function bools_to_bytes(vec::Vector{Bool})
    vec = extend_bools(vec)
    n = length(vec)
    n == 0 && return [zero(UInt8)]
    @assert rem(n, 8) == 0
    nbytes = div(n, 8)
    return bools_to_nbytes(vec, nbytes)
end

function bools_to_nbytes(vec::Vector{Bool}, nbytes::Int)
    result = zeros(UInt8, nbytes) # preallocate result space
    for byteidx in 1:nbytes
       vecstart = 8*(byteidx-1) + 1
       boolbits = UInt8.(vec[vecstart:(vecstart+7)])
       for i in 1:8
           boolbits[i] = boolbits[i] << (8-i)
       end
       newbyte = reduce(|, boolbits)
       push!(result, newbyte)
    end
    return result[length(result)-nbytes+1:end]
end

function extend_bools(bools::Vector{Bool})
    n = length(bools)
    rm = rem(n, 8)
    while 0 < rm < 8
       push!(bools, false)
       rm += 1
    end
    return bools
end

hope this helps