[ANN] SmallCollections.jl: fast small vectors, sets and dictionaries

The standard container types in Julia (Vector, Set, Dict) can hold an arbitrary number of elements. If the size is known in advance, one can use dedicated types like SVector or MVector from StaticArrays.jl that are much faster and, in the case of SVector, don’t allocate any memory. In many problems, however, the size is not fixed, but instead known to be “small”. In enumerative combinatorics, for example, combinatorial explosions make it often impossible to reach “large” vectors or sets.

SmallCollections.jl offers vector, set and dictionary types that can hold any number of elements up to some (small) limit. Most of these types come in a mutable and an immutable version. With bit type elements, the immutable versions don’t allocate. The mutable ones usually require bit type elements for full functionality (as does MVector). This approach leads to significant speed-ups compared to the standard types. For instance, the performance of the vector types SmallVector and MutableSmallVector is close to that to SVector and MVector. The most extreme performance gain is with SmallBitSet, which can be 100x faster than BitSet. (The exact timings depend much on your processor.)

Below I discuss the new types and, as an example application, the submodule Combinatorics. Compared to their counterparts in other combinatorics packages, the functions in Combinatorics are 1-3 orders of magnitude faster.

The full documentation for SmallCollections.jl is available here.

(Mutable)SmallVector and (Mutable)FixedVector

SmallVector{N,T} and MutableSmallVector{N,T} can hold up to N elements of type T. Internally, these types use fixed-length vectors and fill the unused elements with some default value. The underlying types FixedVector{N,T} and MutableFixedVector{N,T} are similar to SVector and MVector.

Benchmarks for SmallVector and MutableSmallVector

The timings are for 1000 operations of the given type on vectors having between 1 and 31 elements (or exactly 32 elements for fixed-length vectors). If possible, mutating operations were used.

N = 32, T = Int16 v + w v .+= w sum push(!) count(>(c), v)
Vector{T} 44.827 μs 21.258 μs 7.612 μs 5.003 μs 11.241 μs
MVector{N,T} 5.606 μs 19.478 μs 2.301 μs 9.764 μs 2.679 μs
MutableSmallVector{N,T} 3.880 μs 5.258 μs 3.233 μs 3.084 μs 2.131 μs
SVector{N,T} 2.012 μs n/a 1.395 μs 2.053 μs 1.185 μs
SmallVector{N,T} 2.392 μs n/a 2.660 μs 6.437 μs 1.796 μs

Notes: sum for SVector and MVector returns an Int16 instead of Int. SmallCollections has a separate function sum_fast for this. Addition allocates for Vector. To avoid this for MVector, the result was transformed to SVector. push! for MVector and push for SVector are not directly comparable to the others because they change the type of the returned vector, which leads to type instabilities in cases like loops.

For the benchmark code see the file benchmark/benchmark_vec.jl in the repository.

How broadcasting or functions like map or any deal with default values for a (Mutable)SmallVector is governed by a trait called MapStyle.

PackedVector

PackedVector{U,M,T} stores integers with M bits in an unsigned integer of type U. To the outside they appear as having type T. For example, a PackedVector{UInt128,5,Int8} can store 25 (128÷5) signed integers with 5 bits. Besides saving memory, PackedVector may be faster for operations like pushfirst or pop.

(Mutable)SmallSet and SmallBitSet

SmallSet{N,T} and MutableSmallSet{N,T} can hold up to N elements of type T. The type SmallBitSet{U} can hold integers between 1 and M where M is the bit size of the unsigned integer type U (taken to be UInt if omitted). For integers, MutableSmallSet can be as fast (or even faster than) BitSet. SmallBitSet can be up to 100x faster than BitSet.

Benchmarks for SmallSet, MutableSmallSet and SmallBitSet

The timings are for 1000 operations of the given type on sets having between 1 and 8 elements. If possible, mutating operations were used. Note that while a BitSet can hold arbitrarily many elements, the timings for MutableSmallSet wouldn’t change if the elements were drawn from Int16 without restrictions.

N = 16, T = Int16 push(!) intersect(!) issubset in
Set{T} 14.817 μs 87.199 μs 77.615 μs 4.586 μs
MutableSmallSet{N,T} 9.532 μs 14.244 μs 4.392 μs 1.167 μs
SmallSet{N,T} 13.645 μs 17.705 μs 8.806 μs 2.182 μs
BitSet 16.184 μs 21.225 μs 7.518 μs 1.983 μs
SmallBitSet{UInt16} 1.377 μs 36.318 ns 56.222 ns 1.094 μs

For the benchmark code see the file benchmark/benchmark_set.jl in the repository.

(Mutable)SmallDict

SmallDict{N,K,V} and MutableSmallDict{N,K,V} can hold up to N entries with key type K and value type V. Since keys and values are stored as small vectors, reverse lookup with invget is as fast as regular lookup.

Benchmarks for SmallDict and MutableSmallDict

The timings are for 1000 operations of the given type on dictionaries created with 30 randomly chosen key-value pairs. If possible, mutating operations were used.

N = 32, K = V = Int8 getindex invget setindex(!) pop(!) iterate
Dict{K,V} 10.739 μs n/a 27.604 μs 19.932 μs 406.180 μs
MutableSmallDict{N,K,V} 1.853 μs 2.650 μs 8.762 μs 7.460 μs 18.653 μs
SmallDict{N,K,V} 1.864 μs 1.495 μs 9.516 μs 17.761 μs 16.514 μs

For the benchmark code see the file benchmark/benchmark_dict.jl in the repository.

Combinatorics

The submodule Combinatorics contains functions for enumerative combinatorics that are based on types provided by SmallCollections.jl. Currently this module is more of a proof-of-concept; it may be turned into a separate package in the future. Here are two example benchmarks (done with Julia 1.11.1, Combinatorics.jl 1.0.3, Combinat.jl 0.1.3, Oscar.jl 1.3.1, GAP 4.14.0 and Sage 10.6, on an older computer).

Permutations

Loop over all permutations of 1:9 and add up the image of 1 under each permutation. The iterator returned by permutations yields each permutation as a SmallVector{16,Int8}. It does not provide cycle decompositions or the like.

julia> n = 9; @b sum(p[1] for p in Combinatorics.permutations($n))
2.987 ms   # 1.247 ms with @inbounds(p[1])

julia> n = 9; @b sum(p[1] for p in permutations(1:$n))  # Combinatorics.jl
24.274 s (725763 allocs: 44.297 MiB, 0.09% gc time, without a warmup)

julia> n = 9; @b sum(p[1] for p in Combinat.Permutations($n))  # Combinat.jl
21.130 ms (725762 allocs: 44.297 MiB, 6.39% gc time)

julia> n = 9; @b sum(p(1) for p in symmetric_group($n))   # Oscar.jl (using GAP)
756.784 ms (931061 allocs: 28.470 MiB, without a warmup)

gap> Sum(SymmetricGroup(9), p -> 1^p);; time;  # GAP
764  # milliseconds

sage: timeit('sum(p[0] for p in Permutations(9))')  # Sage
5 loops, best of 3: 3.13 s per loop

Subsets

Loop over all 10-element subsets of 1:20 and add up the sum of the elements of each subset. The iterator returned by subsets yields each subset as a SmallBitSet.

julia> n = 20; k = 10; @b sum(sum, Combinatorics.subsets($n, $k))
2.830 ms

julia> n = 20; k = 10; @b sum(sum, combinations(1:$n, $k))  # Combinatorics.jl
15.216 ms (369514 allocs: 25.373 MiB)

julia> n = 20; k = 10; @b sum(sum, Combinat.Combinations(1:$n, $k))  # Combinat.jl
15.080 ms (369521 allocs: 25.373 MiB)

gap> s := 0;; for c in IteratorOfCombinations([1..20], 10) do s := s + Sum(c); od; time;  # GAP
317  # milliseconds

sage: timeit('sum(sum(c) for c in Subsets(20, 10))')  # Sage
5 loops, best of 3: 17.9 s per loop   # 11 seconds with Sage 9.4
31 Likes

This is great!!

Are the immutable types isbits compatable?

Yes, the immutable types satisfy isbitstype as long as the element types do so.

2 Likes

Very impressive, but there are better packages to compare to:

julia> using Chevie

julia> @btime sum(p[1] for p in permutations(9))
  15.592 ms (725765 allocations: 47.07 MiB)

julia> @btime sum(1^p for p in symmetric_group(9))
  14.597 ms (823669 allocations: 31.39 MiB)

julia> @btime sum(sum,Combinat.Combinations(1:20,10))
  8.544 ms (369521 allocations: 25.37 MiB)

Your package beats these times, but they are for general algorithms (no limit on the
size of inputs)

1 Like

Great, thanks for pointing that out! I’m going to add Combinat.jl and Chevie.jl to the benchmarks.

1 Like