Bitmaps have some pitfalls, either bits take 8x the storage, stored in a byte, or you have false sharing and problems with threads (I understand the container in C++ considered buggy, and in Julia likely in the same way). But even that space-efficient format isn’t as efficient as it could be. Many languages wrap CRoaring and likely Julia should too:
Wherever you could imagine using a
BitSet
, you could use aRoaringBitmap
, and often profit from the compression. There are two benefits of compression:
- Take up less space in RAM and on disk.
- Taking up less space means faster operations because of better memory locality and cache efficiency.
Knowing a little bit about the compression mechanism helps understand when to use it (or not) and how not to benchmark it. The compression mechanism is prefix compression: the higher 16 bits of each value are stored in an array in the top level of a tree. The lower 16 bits of each value are stored in a container which stores all of the values in a range corresponding to the same higher 16 bits. Recognising that each 16 bit range can have different density and clustering characteristics, there are three types of container, always requiring less than 8KB:
- Sparse:
ArrayContainer
- a sorted array of 16 bit values plus a 16 bit cardinality. Always fewer than 4096 elements.- Dense:
BitmapContainer
- along[]
just likejava.util.BitSet
, requires one bit per value, plus a 16 bit cardinality. Never fewer than 4096 elements.- Really dense, or clustered:
RunContainer
- another sorted array of 16 bit values, where each even value is the start of a run of set bits, and each odd value is the length of the run. Converted to whenever it saves space.