The standard container types in Julia (Vector
, Set
, Dict
) can hold an arbitrary number of elements. If the size is known in advance, one can use dedicated types like SVector
or MVector
from StaticArrays.jl that are much faster and, in the case of SVector
, don’t allocate any memory. In many problems, however, the size is not fixed, but instead known to be “small”. In enumerative combinatorics, for example, combinatorial explosions make it often impossible to reach “large” vectors or sets.
SmallCollections.jl offers vector, set and dictionary types that can hold any number of elements up to some (small) limit. Most of these types come in a mutable and an immutable version. With bit type elements, the immutable versions don’t allocate. The mutable ones usually require bit type elements for full functionality (as does MVector
). This approach leads to significant speed-ups compared to the standard types. For instance, the performance of the vector types SmallVector
and MutableSmallVector
is close to that to SVector
and MVector
. The most extreme performance gain is with SmallBitSet
, which can be 100x faster than BitSet
. (The exact timings depend much on your processor.)
Below I discuss the new types and, as an example application, the submodule Combinatorics
. Compared to their counterparts in other combinatorics packages, the functions in Combinatorics
are 1-3 orders of magnitude faster.
The full documentation for SmallCollections.jl is available here.
(Mutable)SmallVector
and (Mutable)FixedVector
SmallVector{N,T}
and MutableSmallVector{N,T}
can hold up to N
elements of type T
. Internally, these types use fixed-length vectors and fill the unused elements with some default
value. The underlying types FixedVector{N,T}
and MutableFixedVector{N,T}
are similar to SVector
and MVector
.
Benchmarks for SmallVector
and MutableSmallVector
The timings are for 1000 operations of the given type on vectors having between 1 and 31 elements (or exactly 32 elements for fixed-length vectors). If possible, mutating operations were used.
N = 32 , T = Int16 |
v + w |
v .+= w |
sum |
push(!) |
count(>(c), v) |
---|---|---|---|---|---|
Vector{T} |
44.827 μs | 21.258 μs | 7.612 μs | 5.003 μs | 11.241 μs |
MVector{N,T} |
5.606 μs | 19.478 μs | 2.301 μs | 9.764 μs | 2.679 μs |
MutableSmallVector{N,T} |
3.880 μs | 5.258 μs | 3.233 μs | 3.084 μs | 2.131 μs |
SVector{N,T} |
2.012 μs | n/a | 1.395 μs | 2.053 μs | 1.185 μs |
SmallVector{N,T} |
2.392 μs | n/a | 2.660 μs | 6.437 μs | 1.796 μs |
Notes: sum
for SVector
and MVector
returns an Int16
instead of Int
. SmallCollections
has a separate function sum_fast
for this. Addition allocates for Vector
. To avoid this for MVector
, the result was transformed to SVector
. push!
for MVector
and push
for SVector
are not directly comparable to the others because they change the type of the returned vector, which leads to type instabilities in cases like loops.
For the benchmark code see the file benchmark/benchmark_vec.jl
in the repository.
How broadcasting or functions like map
or any
deal with default values for a (Mutable)SmallVector
is governed by a trait called MapStyle
.
PackedVector
PackedVector{U,M,T}
stores integers with M
bits in an unsigned integer of type U
. To the outside they appear as having type T
. For example, a PackedVector{UInt128,5,Int8}
can store 25
(128÷5
) signed integers with 5
bits. Besides saving memory, PackedVector
may be faster for operations like pushfirst
or pop
.
(Mutable)SmallSet
and SmallBitSet
SmallSet{N,T}
and MutableSmallSet{N,T}
can hold up to N
elements of type T
. The type SmallBitSet{U}
can hold integers between 1
and M
where M
is the bit size of the unsigned integer type U
(taken to be UInt
if omitted). For integers, MutableSmallSet
can be as fast (or even faster than) BitSet
. SmallBitSet
can be up to 100x faster than BitSet
.
Benchmarks for SmallSet
, MutableSmallSet
and SmallBitSet
The timings are for 1000 operations of the given type on sets having between 1 and 8 elements. If possible, mutating operations were used. Note that while a BitSet
can hold arbitrarily many elements, the timings for MutableSmallSet
wouldn’t change if the elements were drawn from Int16
without restrictions.
N = 16 , T = Int16 |
push(!) |
intersect(!) |
issubset |
in |
---|---|---|---|---|
Set{T} |
14.817 μs | 87.199 μs | 77.615 μs | 4.586 μs |
MutableSmallSet{N,T} |
9.532 μs | 14.244 μs | 4.392 μs | 1.167 μs |
SmallSet{N,T} |
13.645 μs | 17.705 μs | 8.806 μs | 2.182 μs |
BitSet |
16.184 μs | 21.225 μs | 7.518 μs | 1.983 μs |
SmallBitSet{UInt16} |
1.377 μs | 36.318 ns | 56.222 ns | 1.094 μs |
For the benchmark code see the file benchmark/benchmark_set.jl
in the repository.
(Mutable)SmallDict
SmallDict{N,K,V}
and MutableSmallDict{N,K,V}
can hold up to N
entries with key type K
and value type V
. Since keys and values are stored as small vectors, reverse lookup with invget
is as fast as regular lookup.
Benchmarks for SmallDict
and MutableSmallDict
The timings are for 1000 operations of the given type on dictionaries created with 30 randomly chosen key-value pairs. If possible, mutating operations were used.
N = 32 , K = V = Int8 |
getindex |
invget |
setindex(!) |
pop(!) |
iterate |
---|---|---|---|---|---|
Dict{K,V} |
10.739 μs | n/a | 27.604 μs | 19.932 μs | 406.180 μs |
MutableSmallDict{N,K,V} |
1.853 μs | 2.650 μs | 8.762 μs | 7.460 μs | 18.653 μs |
SmallDict{N,K,V} |
1.864 μs | 1.495 μs | 9.516 μs | 17.761 μs | 16.514 μs |
For the benchmark code see the file benchmark/benchmark_dict.jl
in the repository.
Combinatorics
The submodule Combinatorics
contains functions for enumerative combinatorics that are based on types provided by SmallCollections.jl. Currently this module is more of a proof-of-concept; it may be turned into a separate package in the future. Here are two example benchmarks (done with Julia 1.11.1, Combinatorics.jl 1.0.3, Combinat.jl 0.1.3, Oscar.jl 1.3.1, GAP 4.14.0 and Sage 10.6, on an older computer).
Permutations
Loop over all permutations of 1:9
and add up the image of 1
under each permutation. The iterator returned by permutations
yields each permutation as a SmallVector{16,Int8}
. It does not provide cycle decompositions or the like.
julia> n = 9; @b sum(p[1] for p in Combinatorics.permutations($n))
2.987 ms # 1.247 ms with @inbounds(p[1])
julia> n = 9; @b sum(p[1] for p in permutations(1:$n)) # Combinatorics.jl
24.274 s (725763 allocs: 44.297 MiB, 0.09% gc time, without a warmup)
julia> n = 9; @b sum(p[1] for p in Combinat.Permutations($n)) # Combinat.jl
21.130 ms (725762 allocs: 44.297 MiB, 6.39% gc time)
julia> n = 9; @b sum(p(1) for p in symmetric_group($n)) # Oscar.jl (using GAP)
756.784 ms (931061 allocs: 28.470 MiB, without a warmup)
gap> Sum(SymmetricGroup(9), p -> 1^p);; time; # GAP
764 # milliseconds
sage: timeit('sum(p[0] for p in Permutations(9))') # Sage
5 loops, best of 3: 3.13 s per loop
Subsets
Loop over all 10
-element subsets of 1:20
and add up the sum of the elements of each subset. The iterator returned by subsets
yields each subset as a SmallBitSet
.
julia> n = 20; k = 10; @b sum(sum, Combinatorics.subsets($n, $k))
2.830 ms
julia> n = 20; k = 10; @b sum(sum, combinations(1:$n, $k)) # Combinatorics.jl
15.216 ms (369514 allocs: 25.373 MiB)
julia> n = 20; k = 10; @b sum(sum, Combinat.Combinations(1:$n, $k)) # Combinat.jl
15.080 ms (369521 allocs: 25.373 MiB)
gap> s := 0;; for c in IteratorOfCombinations([1..20], 10) do s := s + Sum(c); od; time; # GAP
317 # milliseconds
sage: timeit('sum(sum(c) for c in Subsets(20, 10))') # Sage
5 loops, best of 3: 17.9 s per loop # 11 seconds with Sage 9.4