Parse an array of bits (`BitArray`) to an integer

I want to parse an array of bits to a decimal number.
I am using this function now but it does not work very fast

function bitarr_to_int(arr)
    return sum(arr .* (2 .^ collect(length(arr)-1:-1:0)))
end

Is there a way to perform this conversion fast?
I tried to use parse but it only get strings, and I couldn’t convert the BitArray to a string properly.

You could access the underlying UInt array directly with arr.chunks. Does the following work for you?

sum(((i, x),) -> big(x) << ((i-1) * 8sizeof(x)), enumerate(arr.chunks))

I think your solution currently works the opposite way, so you might have to do reverse(arr) first.

1 Like

It does work, but it seems slower than the original function I posted.
Is there something I am doing wrong?

julia> function bitarr_to_int(arr)
           return sum(arr .* (2 .^ collect(length(arr)-1:-1:0)))
       end
bitarr_to_int (generic function with 1 method)

julia> function bittest(arr)
           arr = reverse(arr)
           sum(((i, x),) -> big(x) << ((i-1) * sizeof(x)), enumerate(arr.chunks))
       end
bittest (generic function with 1 method)

julia> arr = BitArray([1,0,1,0])
4-element BitArray{1}:
 1
 0
 1
 0

julia> @time bitarr_to_int(arr)
  0.153301 seconds (520.49 k allocations: 25.100 MiB)
10

julia> @time bitarr_to_int(arr)
  0.000005 seconds (6 allocations: 384 bytes)
10

julia> @time bittest(arr)
  0.096877 seconds (218.42 k allocations: 11.088 MiB)
10

julia> @time bittest(arr)
  0.000021 seconds (13 allocations: 440 bytes)
10

Your original function uses BigInt? It seems to me that it just works over Ints. The bittest functions explicitly converts to BigInts. You were sure the original BitArray had not more than 63~64 elements?

1 Like

You are right.
This function gives a bit better performance:

julia> function bittest2(arr)
           arr = reverse(arr)
           sum(((i, x),) -> Int(x) << ((i-1) * sizeof(x)), enumerate(arr.chunks))
       end
bittest2 (generic function with 1 method)

julia> @time bittest2(arr)
  0.090779 seconds (68.21 k allocations: 3.137 MiB)
10

julia> @time bittest2(arr)
  0.000004 seconds (7 allocations: 304 bytes)
10

If your array never contains more than 64 entries, you can just use arr.chunks[1]. I chose BigInt here, because your example overflows if there are more than 63 entries in arr.

1 Like

I see. I know that my arrays are always < 60 entries, so I do not need the BigInt.

I want to run this conversion for 10^7 such arrays.
When I run both functions in a for loop, they give roughly the same result (your method is ~7 seconds, and my original function is ~11 seconds).
Is there a better way to run this for an entire 2d array (in the appropriate axis of course)?

arr.chunks[1] should basically be free. reverse might be expensive. The inplace reverse! might be a bit more efficient here, because it shouldn’t allocate. A better solution might be to restructure your code a bit, so you don’t need to reverse the array. It’s difficult to give concrete tips without knowing a bit more about your application. Are you currently using a 2d BitArray? If so, it might be a better solution to use a Vector{UInt} as your data structure and implement the bit-twiddling yourself.

Thanks for the answer.
I have a 2d array of bits. Each row is the bit array I am willing to convert to an int.
I guess I thought that even for 10^7 rows in such a 2d BitArray, the operation should be quite fast - my intuition was wrong.

It should be fast, if you can use .chunks and reinterpret, but you should be working along columns, not rows for good performance. Also, if each row/column has less than 60 elements, how do you wish to interpret the missing bits needed to make up a 64 bit integer?

julia> R = bitrand(64, 10^7);  # working along the columns, not rows

julia> @btime reinterpret(Int64, ($R).chunks);
  8.264 ns (1 allocation: 32 bytes)  # fast enough?

With less than 64 elements you need to apply some bitmask or something.

If you decide to go with a vector of UInts, here is how you can treat a single UInt like a BitVector:

julia> struct SBitVector <: AbstractVector{Bool}
           x::UInt
           len::Int
       end

julia> Base.size(v::SBitVector) = (v.len,)

julia> Base.@propagate_inbounds function Base.getindex(v::SBitVector, i::Int)
           Base.@boundscheck 1 <= i <= length(v)
           return (v.x >> (i-1)) % Bool
       end

julia> Base.@propagate_inbounds function Base.setindex(v::SBitVector, i::Int, x::Bool)
           Base.@boundscheck 1 <= i <= length(v) || throw(BoundsError(v, i))
           mask = UInt(1) << (i-1)
           res = ifelse(x, v.x | mask, v.x & ~mask)
           return SBitVector(res, v.len)
       end

julia> v = SBitVector(rand(UInt8), 8)
8-element SBitVector:
 0
 1
 1
 0
 1
 0
 0
 0

julia> v[3]
true

julia> Base.setindex(v, 3, false)
8-element SBitVector:
 0
 1
 0
 0
 1
 0
 0
 0

You could then just make a vector of SBitVectors.

Edit:
Probably more efficient in most cases, if the length is a type parameter:

julia> struct SBitVector{len} <: AbstractVector{Bool}
           x::UInt
       end

julia> Base.size(v::SBitVector{len}) where {len} = (len,)

julia> Base.@propagate_inbounds function Base.getindex(v::SBitVector, i::Int)
           Base.@boundscheck 1 <= i <= length(v) || throw(BoundsError(v, i))
           return (v.x >> (i-1)) % Bool
       end

julia> Base.@propagate_inbounds function Base.setindex(v::SBitVector{len}, i::Int, x::Bool) where {len}
           Base.@boundscheck 1 <= i <= length(v) || throw(BoundsError(v, i))
           mask = UInt(1) << (i-1)
           res = ifelse(x, v.x | mask, v.x & ~mask)
           return SBitVector{len}(res)
       end

julia> v = SBitVector{8}(rand(UInt8))
8-element SBitVector{8}:
 0
 1
 1
 1
 0
 0
 1
 1

julia> v[3]
true

julia> Base.setindex(v, 3, false)
8-element SBitVector{8}:
 0
 1
 0
 1
 0
 0
 1
 1

I am actually working with columns (I wrote rows, because I thought it is more natural for understanding).

Each column has 20 rows - so it is a 20 bit number. I would like to read it as a 20 bit number - like padding all of first 44 bits with zeros.

I tried to run the reinterpret function you used. I do not understand the $ sign before the R. It does not run like this for me.
I ran reinterpret(Int64, (R).chunks) , but as a result I get also negative numbers, which should not be. So I do not really understand how this works.

Could you explain a bit more on how to use reinterpret here?

I am not sure I understand the context, but if possible, I would just consider building an UInt64 directly with | and 1 << index when processing the original data.

1 Like

I think that you mean that you would not maintain the bit array, but just get the integer number?
I need the entire bit array, because I want to drop some of the bits by hand.

I did not follow on what you meant with | and 1<<index. Could you explain that? Maybe it could still help me somehow.

That is nice!
But I need the other direction: to go from a bit-vector to an int.

Start with

z = UInt64(0)

Whenever you want to set the bit for index, do

z |= 1 << index

You can query bits in a similar way. This is basically what BitArray does under the hood, but in a much more organized way, allowing longer vectors etc, but if you need the basics it is really simple.

That’s easy, you can just use v.x as above.

I see. So how can I construct it using a BitArray?

Thanks. I tried that, but it gives the same performance as before.

Similar to @simeonschaub suggestions here:

The $ sign is used for benchmarking, with the BenchmarkTools package.