I have an application where it seems appropriate to have a Dict
effectively containing a large number of lightly wrapped Dict
containers. Say something like, d = Dict{Int64,Dict{Int64,Int128}}()
, where the indices of d
can be pre-computed and the number of elements the inner dictionaries can be approximated, with approximations depending on the indices. I would like to initialize d
by applying sizehint!
to the inner dictionaries, so I could do something like this:
approx(i)=isqrt(i)
s=Set(6:7:400000)
init1(s) = Dict{Int64,Dict{Int64,Int128}}( i=>sizehint!(Dict{Int64,Int128}(),approx(i)) for i in s)
but I noticed that even though sizehint!
on an empty Dict
breaks out of its rehash!
early, it still ends up resizing its slots
, keys
, and vals
fields, which often need to be reallocated in this case. I can avoid the extra allocations by using the 8-argument inner constructor of Dict{K,V}
, but need to refer to Base._tablesz()
, to make sure I size things appropriately.
function init2(s)
d=Dict{Int64,Dict{Int64,Int128}}()
sizehint!(d,length(s))
for i in s
n = Base._tablesz(approx(i))
d[i] = Dict{Int64,Int128}(zeros(UInt8,n), Vector{Int64}(undef,n), Vector{Int128}(undef,n), 0, 0, 0, 1, 0)
end
return d
end
This gives me a noticeable speed improvement for this part of the operation.
using BenchmarkTools
@btime init1($s) # 243.783 ms (399914 allocations: 925.18 MiB)
@btime init2($s) # 175.247 ms (228585 allocations: 900.58 MiB)
but feels, dirty, since Base._tablesz()
doesn’t seem to be something I should be relying on. Is there a reason we couldn’t have
function Dict{K,V}(sh::Int64) where {K,V}
n = _tablesz(sh)
Dict{K,V}(zeros(UInt8,n), Vector{K}(undef, n), Vector{V}(undef, n), 0, 0, 0, 1, 0)
end
or something like that as an inner constructor? It seems like the Dict{K,V}(ps::Pair...)
method would benefit from this.
Alternatively is there a better way to initialize a Dict
when I know the size?