Difference between Set{Int}() and BitSet()

I want to compute the intersection of a matrix of sets. I came across BitSet in addition to Set{Int}, and the following benchmarks (which initialize a matrix of Set or BitSet) baffles me in terms of memory usage:

julia> @benchmark [Set{Int}() for i in 1:1000, j in 1:1000]
  memory estimate:  480.65 MiB
  allocs estimate:  5000002
  minimum time:     362.486 ms (63.15% GC)
  median time:      722.896 ms (79.61% GC)
  mean time:        746.622 ms (81.50% GC)
  maximum time:     1.108 s (86.71% GC)
  samples:          8
  evals/sample:     1
julia> @benchmark [BitSet() for i in 1:1000, j in 1:1000]
  memory estimate:  160.22 MiB
  allocs estimate:  3000002
  minimum time:     62.636 ms (0.00% GC)
  median time:      367.854 ms (80.83% GC)
  mean time:        392.743 ms (82.20% GC)
  maximum time:     542.071 ms (87.16% GC)
  samples:          13
  evals/sample:     1

And indeed using BitSet reduced my code’s memory usage by more than 10 times. My question is:

  1. Why does a matrix of sets (BitSet) require so much memory to initialize? In comparison, a 1000x1000 Matrix{Float64} is around 8MB.
  2. Is BitSet() concretely typed? I want the compiler to know I will only ever push Int into my sets.
  3. When is it more appropriate to use BitSet as opposed to Set? I read the documentation but I don’t quite get it.

Thank you for your time.

I am not sure those benchmarks are very informative, unless all your sets are empty.

As the docs explain, BitSet is basically a bitstring of flags. Use it when your sets are dense.

BitSet happens to be concretely typed, but generally you should use @code_warntype and similar to investigate these things. See https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-tips-1

1 Like

Also consider using static-sized bitsets if you want to represent densish subsets of 1:64*N with known N. It doesn’t look like that PR will land in static arrays, but you can just take the source and run with it.

1 Like