Designing a parametric type for storing bits

So I want to build a barebones static BitArray and I can’t seem to use the type system properly. Here’s my first attempt at a definition:

struct SBitVector{N} <: AbstractVector{Bool}
    chunks::SVector{M,UInt64}
    len::Int
end

I’m following the BitArray code, only using a static vector for the bit data. I want N to be len, and M to be div(N,64,RoundingUp). But this doesn’t work because M hasn’t been defined yet.

I guess I need to include M in the signature and then write an inner constructor. But even this is beyond me and nothing I’ve tried compiles.

struct SBitVector{N,M} <: AbstractVector{Bool} where {N,M} # am I using where properly?
    chunks::SVector{M,UInt64}
    len::Int
    function SBitVector(Vector{UInt64},len)
     # not sure how to do arithmetic on type parameters to enforce M = div(N,64,RoundingUp)
     # len = N etc.
    end

end

But it’s weird needing two parameters when the second is strictly determined by the first (never mind that I would like to drop len as well, since it is itself is just equal to N!). So any ideas on how to define this kind of thing idiomatically and/or elegantly whichever comes first?

P.S. Use case is these will become “genomes” of genetic algorithms, length a few hundred bits and I need them small so they fit on the cache, and SVector{Bool,len} is 8bits per bit.

For this reason I’d also like to drop len, and use the type parameter itself as function parameter where needed to save space for another UInt64 chunk.

Thanks for all the help.

But it’s weird needing two parameters when the second is strictly determined by the first (never mind that I would like to drop len as well, since it is itself is just equal to N!). So any ideas on how to define this kind of thing idiomatically and/or elegantly whichever comes first?

Then answer is, unfortunately, that you must have both parameters. I don’t think there is any good reason for this, except there is no language mechanism to specify that N and M are linked.

But even this is beyond me and nothing I’ve tried compiles.

You can do it like this:

struct SBitVector{N,M} <: AbstractVector{Bool}
    chunks::SVector{M,UInt64}
    
    function SBitVector(s::Vector{UInt64}, len::Integer)
        if length(s) != cld(len, 64)
            error("Some error message")
        end
        new{len,M}(@SVector s)
    end
end

You might also want to mask the unused bits in the vector s, so it’s easier to compare two vectors.
This will require a run-time check when instantiating the vector. You can skip this check if the compiler knows len and the length of the vector beforehand, then the check will constant fold:

struct SBitVector{N,M} <: AbstractVector{Bool}
    chunks::SVector{M,UInt64}
    
    function SBitVector(s::SVector{M, UInt64}, ::Val{L}) where {M, L}
        if !isa(L, Integer) || length(s) != cld(L, 64)
            error("Some error message")
        end
        new{L,M}(s)
    end
end

, in which case the check is a no-op:

julia> @code_native SBitVector(@SVector([UInt(1)]), Val(3))
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[8]:8 within `SBitVector'
	movq	(%rsi), %rax
	retq
	nopw	%cs:(%rax,%rax)
; └
2 Likes

This would be a great package btw, especially if you could us UInt8/UInt16/UInt32 as the internal value as well.

1 Like

This will require a run-time check when instantiating the vector. You can skip this check if the compiler knows len and the length of the vector beforehand, then the check will constant fold:

Thanks, that is a super clear description of why to use Val. i.e. type parameter math is done at compile time!

@Raf, via your suggestion I have the UInt type as part of the signature now. Maybe I can get some feedback on performance expectations. This is my code now, so you and others can see how I’m also extending Base AbstractArray methods:

struct SBitVector{N,M,T<:UInt} <: AbstractVector{Bool}
    # type signiture: {Length, ChunksLength, ChunksType}
    chunks::SVector{M,T}
    function SBitVector(s::SVector{M, T}, ::Val{L}) where {M,T,L}
         if !isa(L, Integer) || length(s) != cld(L, sizeof(T)*8)
             error("register/length mismatch")
         end
         new{L,M,T}(s)
    end
end

Base.size(::SBitVector{N,M}) where {N,M} = (N,)
Base.IndexStyle(::Type{<:SBitVector}) = IndexLinear()
Base.getindex(S::SBitVector{N,M,T}, ind::Int) where {N,M,T} = readbit_mod(S.chunks[cld(ind, sizeof(T)*8)], ind)
readbit_mod(x::T, loc::Int) where {T<:UInt} = (x >> (mod1(loc, sizeof(T)*8) - 1)) & one(T) == one(T)

function Base.iterate(S::SBitVector{N,M,T}, state=(1, S[1])) where {N,M,T} 
    (ind, val::T) = state
    if ind > N
        nothing
    else
        return @inbounds (readbit_mod(val, ind), 
            (ind+1, 
            mod1(ind,sizeof(T)*8) == 1 ? S.chunks[cld(ind, sizeof(T)*8)] : val))
    end
end

The iteration method is so I don’t have to getindex every iterate call. Not sure how much better that is in the long run (if at all, as chunks is typically short in the scheme of things).

I feel like I am only 20% of the way to understanding julia type parameters :sweat_smile:. Still pattern matching without being able to reason through.

1 Like

Great. You might be able to inherit from StaticArray as well to take advantage of all their predefined methods. If you have a Tuple{S} of axis size then you can just inherit from StraticArray:

SBitArray{L,S,T,N} <: StaticArray{S,T,N} end

It’s a little confusing you haveboth L and N for length. N is normally used to mean ndims in AbstractArray.

Instead of the Val you can also put L as a type parameter, especially as its first already.

struct SBitVector{L,M,T<:UInt} <: AbstractVector{Bool}
    # type signiture: {Length, ChunksLength, ChunksType}
    chunks::SVector{M,T}
    function SBitVector{L}(s::SVector{M, T}) where {L,M,T}
         if !isa(L, Integer) || length(s) != cld(L, sizeof(T)*8)
             error("register/length mismatch")
         end
         new{L,M,T}(s)
    end
end

bitvec = SBitVector{L}(somevector)
1 Like

If you want to collaborate on making this a package I’m keen to help, I just got a git issue in another package that needs this :slight_smile:

1 Like

I’ve never done that but sounds like a good idea. There must be a wide demand for this kind of thing. I’ll dm ya…

1 Like

@Raf, and anyone else who wants to contribute

(or just watch a noob slowly learn Julia types, and software testing)

you are welcome to follow along here.

1 Like