`issubset(::BitSet,::BitSet)` is slow and allocates unnecessarily

guyvdb · July 28, 2020, 6:23pm

The implementation of issubset between two BitSet: https://github.com/JuliaLang/julia/blob/3909294e78d223c740b7e7a4d00eb2d4771055a7/base/bitset.jl#L419
first intersects the sets. This unnecessarily allocates memory to compute an entirely new BitSet for the intersection. It ended up costing me 30% CPU time on my workload.

It seems that on ‘standard’ sizes of BitSet, the following implementation is often 6x faster and has 0 allocations.

function issubset_fast(s::BitSet,t::BitSet)
    s == t && return true
    s.offset < t.offset && return false
    length(s.bits) > length(t.bits) && return false
    length(s) > length(t) && return false
    return all(e -> e ∈ t, s)
end

s1 = BitSet([2i for i=1:100]);
s2 = BitSet([7i+70 for i=1:15]);
s3 = BitSet([20i+70 for i=1:6]);
s4 = BitSet([2i for i=1:99]);

@btime issubset($s1, $s1);
@btime issubset_fast($s1, $s1); 
@btime issubset($s2, $s1);
@btime issubset_fast($s2, $s1); 
@btime issubset($s3, $s1);
@btime issubset_fast($s3, $s1); 
@btime issubset($s4, $s1);
@btime issubset_fast($s4, $s1);

Giving:

  85.817 ns (3 allocations: 160 bytes)
  8.865 ns (0 allocations: 0 bytes)
  86.080 ns (3 allocations: 160 bytes)
  14.888 ns (0 allocations: 0 bytes)
  86.146 ns (3 allocations: 160 bytes)
  19.714 ns (0 allocations: 0 bytes)
  88.973 ns (3 allocations: 160 bytes) 
  142.452 ns (0 allocations: 0 bytes)

On the last issubset(s4, s1) the implementation in Base is still faster, I suppose it has been optimized for this case.

Still, I’m hoping that someone who understands the internals of BitSet better than me can make a general-purpose non-allocating version that’s faster on all cases?

stev47 · August 2, 2020, 1:28pm

maybe something like this?

using BenchmarkTools

function issubset_fast(a::BitSet, b::BitSet)
    n = length(a.bits)
    shift = b.offset - a.offset
    i, j = shift, shift + length(b.bits)

    f(a, b) = a == a & b
    return (
        all(@inbounds iszero(a.bits[i]) for i in 1:min(n, i)) &&
        all(@inbounds f(a.bits[i], b.bits[i - shift]) for i in max(1, i+1):min(n, j)) &&
        all(@inbounds iszero(a.bits[i]) for i in max(1, j+1):n))
end

s1 = BitSet([2i for i=1:100]);
s2 = BitSet([7i+70 for i=1:15]);
s3 = BitSet([20i+70 for i=1:6]);
s4 = BitSet([2i for i=1:99]);

@btime issubset($s1, $s1);
@btime issubset_fast($s1, $s1);
@btime issubset($s2, $s1);
@btime issubset_fast($s2, $s1);
@btime issubset($s3, $s1);
@btime issubset_fast($s3, $s1);
@btime issubset($s4, $s1);
@btime issubset_fast($s4, $s1);

On my machine:

  133.868 ns (3 allocations: 160 bytes)
  8.346 ns (0 allocations: 0 bytes)
  128.104 ns (3 allocations: 160 bytes)
  5.216 ns (0 allocations: 0 bytes)
  123.255 ns (3 allocations: 160 bytes)
  7.014 ns (0 allocations: 0 bytes)
  128.310 ns (3 allocations: 160 bytes)
  8.346 ns (0 allocations: 0 bytes)

Though the speedup observed here might be somewhat unrepresentative.

Topic		Replies	Views
Difference between Set{Int}() and BitSet() Performance question , performance	2	2650	August 20, 2019
Set performance question New to Julia	3	566	May 8, 2020
Broadcast of .== slow performance, allocations Performance question , performance	5	792	May 16, 2019
Measurements Indicate that this is “Faster than the Speed of Light” New to Julia data_structures	4	612	April 16, 2021
Can gpu calculation accelerates this function? New to Julia	2	215	October 5, 2022

`issubset(::BitSet,::BitSet)` is slow and allocates unnecessarily

Related topics