How do I iterate over the bits of a UInt8, UInt16 or UInt64? I tried
julia> for x in digits(Bool, UInt16(17),2) ...
It works but it creates an intermediate array, I am not sure this is how it should be done.
How do I iterate over the bits of a UInt8, UInt16 or UInt64? I tried
julia> for x in digits(Bool, UInt16(17),2) ...
It works but it creates an intermediate array, I am not sure this is how it should be done.
There is the digits!
function which at least allow you to re-use the allocated array. I’ve a not-yet-published package for this kind of things, I’ll try to remember to link it here when it’s ready.
Do you actually want to iterate over the bits? Or the digits? Here’s an example of a simple BitIterator:
struct BitIterator{T}
val::T
end
Base.start(b::BitIterator) = 1
Base.next(b::BitIterator{T}, i) where {T} = ((b.val & T(2)^(sizeof(T) * 8 - i)) == 1, i + 1)
nbits(T) = sizeof(T) * 8
Base.done(b::BitIterator{T}, i) where {T} = i > nbits(T)
Which can be used like
julia> b = BitIterator(UInt8(1))
BitIterator{UInt8}(0x01)
julia> bits(UInt8(1))
"00000001"
julia> for i in b
println(i)
end
Loop variable "i" overwrites a variable in an enclosing scope. In the future the variable will be local to the loop instead.
false
false
false
false
false
false
false
true
julia> b = BitIterator(UInt64(1))
BitIterator{UInt64}(0x0000000000000001)
julia> bits(UInt64(1))
"0000000000000000000000000000000000000000000000000000000000000001"
julia> for i in b
println(i)
end
Loop variable "i" overwrites a variable in an enclosing scope. In the future the variable will be local to the loop instead.
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
true
For those wondering how you’d do this nowadays:
struct BitIterator{T}
val::T
end
function Base.iterate(S::BitIterator{T}, i = 1) where {T}
nbits = sizeof(T) * 8
if i > nbits
return nothing
end
return ((S.val & T(2)^(nbits - i)) == 1, i + 1)
end
Here’s another solution:
bits(x) = (x>>i & 1 for i in 0:8*sizeof(x)-1)
julia> bits(Int8(17)) |> collect
8-element Vector{Int64}:
1
0
0
0
1
0
0
0
Edit: I originally wrote sizeof(x)
instead of sizeof(x)-1
, thanks @stevengj for catching it.
Rather than computing 2^i
on each iteration, it should be more efficient to do a shift of a bitmask on each iteration:
struct BitIterator{T<:Integer}
val::T
end
Base.length(itr::BitIterator) = sizeof(itr.val) * 8
Base.eltype(itr::BitIterator) = Bool
function Base.iterate(itr::BitIterator{T}, mask=one(T)) where {T}
iszero(mask) && return nothing
return !iszero(itr.val & mask), mask << 1
end
which gives
julia> collect(BitIterator(Int8(17)))'
1Ă—8 adjoint(::Vector{Bool}) with eltype Bool:
1 0 0 0 1 0 0 0
@sijo gave a related solution using a generator, but
has the wrong length — it should be 0:8*sizeof(x)-1
(Note, however, that none of these solutions work for BigInt
, for which sizeof
is not the number of bytes in the value and for which repeatedly shifting left never produces zero. You would want specialized BitIterator{BigInt}
methods to handle that case.)
That’s a really nice solution, @stevengj! On the back of this, what would you do if you wanted to iterate over bits of bytes? Would you just create a generator using BitIterator
and flatten it?
bytes = [0x02, 0x01, 0x03]
Iterators.flatten(i for i in BitIterator.(bytes)) |> collect
Or would you create another struct
for such a case?
I would re-think my life choices, since this is a weird programming pattern? Or use a BitVector
?
That works too, but you have to be consistent about the order in which you want to iterate. BitIterator
goes from least-significant (“rightmost”) to most-significant, so the analogous thing would probably be:
Iterators.flatten(BitIterator(b) for b in Iterators.reverse(bytes)) |> collect
Note that I would avoid BitIterator.(bytes)
, which allocates an unnecessary intermediate array of BitIterator
.
The reason I ask is, for a bit of fun, I’m implementing some LSB-steganography in Julia, based on this, but I am trying to use fewer strings. As such, I am trying to use bytes and bits and such. I wanted to put the bits in a BitVector so that I can zip
it with the CartesianCoordinates
of the image matrix. Previously I was using
vcat((digits(b, base = 2, pad = 8) for b in bytes)...)
But that seems inefficient if an iterator exists…
Aside: One of the nice things about implementing an iterator type, as opposed to using a generator, is that you can easily define additional methods on it, e.g. you can make it indexable to fetch bits with itr[bit]
:
Base.firstindex(itr::BitIterator) = 0
Base.lastindex(itr::BitIterator) = length(itr)-1
Base.getindex(itr::BitIterator{T}, i::Integer) where {T} = !iszero(itr.val & (one(T) << i))
and in principle you could define other specialized methods, e.g. define count(iszero, itr)
using count_zeros(itr.val)
.
(But at some point you are re-implementing a static-length BitVector
here.)
You realize that BitVector
is an actual type in Julia, right? You can use this if you want a “bit-packed” array of boolean (bit) values, of arbitrary length, accessed like an array.
Yes, I know it’s a type in Julia, however I don’t have an array of booleans, I have an array of bytes (i.e., Vector{UInt8}
), which is why your BitIterator
type is super useful in this case
You have an array of bytes that you want to interpret as an array of bits. Why not store these bits in a BitVector
instead (which under the hood uses an array of integer “words” to store the bits)?
(Though it is typically faster to work with an array of Bool
values — one byte per bit. This costs you some storage but is typically more efficient to work with.)
That’s what I’d like to do…only I’m not sure how to convert Vector{UInt8}
or Base.CodeUnits
into a BitVector
… That said, would it not be better if I only had an iterator? Then I don’t have to store a (potentially large) BitVector
in memory?
What is the source of your data? Why not read it into a BitVector
(or BitMatrix
for an image) to start with?
I’m not sure what my source data is going to be. I’m implementing the function for Vector{UInt8}
, as that seems standard, but to be honest I’m not sure what would be most useful in “real life”. I mentioned CodeUnits
because I figured steganography would probably mostly be used to hide written information, so if you had a message (i.e., a string), you could convert it to bytes using codeunits
.
For the fun of it:
julia> function bits(x)
g = (x>>i & 1 for i in 0:8*sizeof(x)-1)
eval(:(Base.getindex(g::typeof($g), i) = g.f(i-1)))
return g
end;
julia> g = bits(Int8(7));
julia> join(g)
"11100000"
julia> g[3], g[4]
(1, 0)
(Note that this particular method might be implemented by default if this PR gets merged.)