I want to compute the intersection of a matrix of sets. I came across BitSet
in addition to Set{Int}
, and the following benchmarks (which initialize a matrix of Set or BitSet) baffles me in terms of memory usage:
julia> @benchmark [Set{Int}() for i in 1:1000, j in 1:1000]
BenchmarkTools.Trial:
memory estimate: 480.65 MiB
allocs estimate: 5000002
--------------
minimum time: 362.486 ms (63.15% GC)
median time: 722.896 ms (79.61% GC)
mean time: 746.622 ms (81.50% GC)
maximum time: 1.108 s (86.71% GC)
--------------
samples: 8
evals/sample: 1
julia> @benchmark [BitSet() for i in 1:1000, j in 1:1000]
BenchmarkTools.Trial:
memory estimate: 160.22 MiB
allocs estimate: 3000002
--------------
minimum time: 62.636 ms (0.00% GC)
median time: 367.854 ms (80.83% GC)
mean time: 392.743 ms (82.20% GC)
maximum time: 542.071 ms (87.16% GC)
--------------
samples: 13
evals/sample: 1
And indeed using BitSet
reduced my code’s memory usage by more than 10 times. My question is:
- Why does a matrix of sets (BitSet) require so much memory to initialize? In comparison, a 1000x1000 Matrix{Float64} is around 8MB.
- Is
BitSet()
concretely typed? I want the compiler to know I will only ever pushInt
into my sets. - When is it more appropriate to use
BitSet
as opposed toSet
? I read the documentation but I don’t quite get it.
Thank you for your time.