The implementation of issubset
between two BitSet
: https://github.com/JuliaLang/julia/blob/3909294e78d223c740b7e7a4d00eb2d4771055a7/base/bitset.jl#L419
first intersects the sets. This unnecessarily allocates memory to compute an entirely new BitSet
for the intersection. It ended up costing me 30% CPU time on my workload.
It seems that on ‘standard’ sizes of BitSet
, the following implementation is often 6x faster and has 0 allocations.
function issubset_fast(s::BitSet,t::BitSet)
s == t && return true
s.offset < t.offset && return false
length(s.bits) > length(t.bits) && return false
length(s) > length(t) && return false
return all(e -> e ∈ t, s)
end
s1 = BitSet([2i for i=1:100]);
s2 = BitSet([7i+70 for i=1:15]);
s3 = BitSet([20i+70 for i=1:6]);
s4 = BitSet([2i for i=1:99]);
@btime issubset($s1, $s1);
@btime issubset_fast($s1, $s1);
@btime issubset($s2, $s1);
@btime issubset_fast($s2, $s1);
@btime issubset($s3, $s1);
@btime issubset_fast($s3, $s1);
@btime issubset($s4, $s1);
@btime issubset_fast($s4, $s1);
Giving:
85.817 ns (3 allocations: 160 bytes)
8.865 ns (0 allocations: 0 bytes)
86.080 ns (3 allocations: 160 bytes)
14.888 ns (0 allocations: 0 bytes)
86.146 ns (3 allocations: 160 bytes)
19.714 ns (0 allocations: 0 bytes)
88.973 ns (3 allocations: 160 bytes)
142.452 ns (0 allocations: 0 bytes)
On the last issubset(s4, s1)
the implementation in Base
is still faster, I suppose it has been optimized for this case.
Still, I’m hoping that someone who understands the internals of BitSet
better than me can make a general-purpose non-allocating version that’s faster on all cases?