# Parse an array of bits (`BitArray`) to an integer

I want to parse an array of bits to a decimal number.
I am using this function now but it does not work very fast

``````function bitarr_to_int(arr)
return sum(arr .* (2 .^ collect(length(arr)-1:-1:0)))
end
``````

Is there a way to perform this conversion fast?
I tried to use `parse` but it only get strings, and I couldn’t convert the `BitArray` to a string properly.

You could access the underlying `UInt` array directly with `arr.chunks`. Does the following work for you?

``````sum(((i, x),) -> big(x) << ((i-1) * 8sizeof(x)), enumerate(arr.chunks))
``````

I think your solution currently works the opposite way, so you might have to do `reverse(arr)` first.

1 Like

It does work, but it seems slower than the original function I posted.
Is there something I am doing wrong?

``````julia> function bitarr_to_int(arr)
return sum(arr .* (2 .^ collect(length(arr)-1:-1:0)))
end
bitarr_to_int (generic function with 1 method)

julia> function bittest(arr)
arr = reverse(arr)
sum(((i, x),) -> big(x) << ((i-1) * sizeof(x)), enumerate(arr.chunks))
end
bittest (generic function with 1 method)

julia> arr = BitArray([1,0,1,0])
4-element BitArray{1}:
1
0
1
0

julia> @time bitarr_to_int(arr)
0.153301 seconds (520.49 k allocations: 25.100 MiB)
10

julia> @time bitarr_to_int(arr)
0.000005 seconds (6 allocations: 384 bytes)
10

julia> @time bittest(arr)
0.096877 seconds (218.42 k allocations: 11.088 MiB)
10

julia> @time bittest(arr)
0.000021 seconds (13 allocations: 440 bytes)
10

``````

Your original function uses `BigInt`? It seems to me that it just works over `Int`s. The `bittest` functions explicitly converts to `BigInt`s. You were sure the original `BitArray` had not more than 63~64 elements?

1 Like

You are right.
This function gives a bit better performance:

``````julia> function bittest2(arr)
arr = reverse(arr)
sum(((i, x),) -> Int(x) << ((i-1) * sizeof(x)), enumerate(arr.chunks))
end
bittest2 (generic function with 1 method)

julia> @time bittest2(arr)
0.090779 seconds (68.21 k allocations: 3.137 MiB)
10

julia> @time bittest2(arr)
0.000004 seconds (7 allocations: 304 bytes)
10
``````

If your array never contains more than 64 entries, you can just use `arr.chunks`. I chose `BigInt` here, because your example overflows if there are more than 63 entries in `arr`.

1 Like

I see. I know that my arrays are always < 60 entries, so I do not need the `BigInt`.

I want to run this conversion for 10^7 such arrays.
When I run both functions in a for loop, they give roughly the same result (your method is ~7 seconds, and my original function is ~11 seconds).
Is there a better way to run this for an entire 2d array (in the appropriate axis of course)?

`arr.chunks` should basically be free. `reverse` might be expensive. The inplace `reverse!` might be a bit more efficient here, because it shouldn’t allocate. A better solution might be to restructure your code a bit, so you don’t need to reverse the array. It’s difficult to give concrete tips without knowing a bit more about your application. Are you currently using a 2d BitArray? If so, it might be a better solution to use a `Vector{UInt}` as your data structure and implement the bit-twiddling yourself.

I have a 2d array of bits. Each row is the bit array I am willing to convert to an `int`.
I guess I thought that even for 10^7 rows in such a 2d `BitArray`, the operation should be quite fast - my intuition was wrong.

It should be fast, if you can use `.chunks` and `reinterpret`, but you should be working along columns, not rows for good performance. Also, if each row/column has less than 60 elements, how do you wish to interpret the missing bits needed to make up a 64 bit integer?

``````julia> R = bitrand(64, 10^7);  # working along the columns, not rows

julia> @btime reinterpret(Int64, (\$R).chunks);
8.264 ns (1 allocation: 32 bytes)  # fast enough?
``````

With less than 64 elements you need to apply some bitmask or something.

If you decide to go with a vector of UInts, here is how you can treat a single UInt like a BitVector:

``````julia> struct SBitVector <: AbstractVector{Bool}
x::UInt
len::Int
end

julia> Base.size(v::SBitVector) = (v.len,)

julia> Base.@propagate_inbounds function Base.getindex(v::SBitVector, i::Int)
Base.@boundscheck 1 <= i <= length(v)
return (v.x >> (i-1)) % Bool
end

julia> Base.@propagate_inbounds function Base.setindex(v::SBitVector, i::Int, x::Bool)
Base.@boundscheck 1 <= i <= length(v) || throw(BoundsError(v, i))
return SBitVector(res, v.len)
end

julia> v = SBitVector(rand(UInt8), 8)
8-element SBitVector:
0
1
1
0
1
0
0
0

julia> v
true

julia> Base.setindex(v, 3, false)
8-element SBitVector:
0
1
0
0
1
0
0
0
``````

You could then just make a vector of `SBitVector`s.

Edit:
Probably more efficient in most cases, if the length is a type parameter:

``````julia> struct SBitVector{len} <: AbstractVector{Bool}
x::UInt
end

julia> Base.size(v::SBitVector{len}) where {len} = (len,)

julia> Base.@propagate_inbounds function Base.getindex(v::SBitVector, i::Int)
Base.@boundscheck 1 <= i <= length(v) || throw(BoundsError(v, i))
return (v.x >> (i-1)) % Bool
end

julia> Base.@propagate_inbounds function Base.setindex(v::SBitVector{len}, i::Int, x::Bool) where {len}
Base.@boundscheck 1 <= i <= length(v) || throw(BoundsError(v, i))
return SBitVector{len}(res)
end

julia> v = SBitVector{8}(rand(UInt8))
8-element SBitVector{8}:
0
1
1
1
0
0
1
1

julia> v
true

julia> Base.setindex(v, 3, false)
8-element SBitVector{8}:
0
1
0
1
0
0
1
1
``````

I am actually working with columns (I wrote rows, because I thought it is more natural for understanding).

Each column has 20 rows - so it is a 20 bit number. I would like to read it as a 20 bit number - like padding all of first 44 bits with zeros.

I tried to run the `reinterpret` function you used. I do not understand the `\$` sign before the `R`. It does not run like this for me.
I ran `reinterpret(Int64, (R).chunks)` , but as a result I get also negative numbers, which should not be. So I do not really understand how this works.

Could you explain a bit more on how to use `reinterpret` here?

I am not sure I understand the context, but if possible, I would just consider building an `UInt64` directly with `|` and `1 << index` when processing the original data.

1 Like

I think that you mean that you would not maintain the bit array, but just get the integer number?
I need the entire bit array, because I want to drop some of the bits by hand.

I did not follow on what you meant with `|` and `1<<index`. Could you explain that? Maybe it could still help me somehow.

That is nice!
But I need the other direction: to go from a bit-vector to an int.

``````z = UInt64(0)
``````

Whenever you want to set the bit for `index`, do

``````z |= 1 << index
``````

You can query bits in a similar way. This is basically what `BitArray` does under the hood, but in a much more organized way, allowing longer vectors etc, but if you need the basics it is really simple.

That’s easy, you can just use `v.x` as above.

I see. So how can I construct it using a `BitArray`?

Thanks. I tried that, but it gives the same performance as before.

Similar to @simeonschaub suggestions here:

The `\$` sign is used for benchmarking, with the BenchmarkTools package.