Proper way to make an empty sizehinted Dict

malacroi · November 29, 2020, 8:59pm

I have an application where it seems appropriate to have a Dict effectively containing a large number of lightly wrapped Dict containers. Say something like, d = Dict{Int64,Dict{Int64,Int128}}(), where the indices of d can be pre-computed and the number of elements the inner dictionaries can be approximated, with approximations depending on the indices. I would like to initialize d by applying sizehint! to the inner dictionaries, so I could do something like this:

approx(i)=isqrt(i)
s=Set(6:7:400000)
init1(s) = Dict{Int64,Dict{Int64,Int128}}( i=>sizehint!(Dict{Int64,Int128}(),approx(i)) for i in s)

but I noticed that even though sizehint! on an empty Dict breaks out of its rehash! early, it still ends up resizing its slots, keys, and vals fields, which often need to be reallocated in this case. I can avoid the extra allocations by using the 8-argument inner constructor of Dict{K,V}, but need to refer to Base._tablesz(), to make sure I size things appropriately.

function init2(s)
    d=Dict{Int64,Dict{Int64,Int128}}()
    sizehint!(d,length(s))
    for i in s
        n = Base._tablesz(approx(i)) 
        d[i] = Dict{Int64,Int128}(zeros(UInt8,n), Vector{Int64}(undef,n), Vector{Int128}(undef,n), 0, 0, 0, 1, 0)
    end
    return d
end

This gives me a noticeable speed improvement for this part of the operation.

using BenchmarkTools
@btime init1($s) #   243.783 ms (399914 allocations: 925.18 MiB)
@btime init2($s) #   175.247 ms (228585 allocations: 900.58 MiB)

but feels, dirty, since Base._tablesz() doesn’t seem to be something I should be relying on. Is there a reason we couldn’t have

function Dict{K,V}(sh::Int64) where {K,V}
    n = _tablesz(sh)
    Dict{K,V}(zeros(UInt8,n), Vector{K}(undef, n), Vector{V}(undef, n), 0, 0, 0, 1, 0)
end

or something like that as an inner constructor? It seems like the Dict{K,V}(ps::Pair...) method would benefit from this.

Alternatively is there a better way to initialize a Dict when I know the size?

malacroi · December 1, 2020, 2:03am

The best solution I’ve encountered so far is to refactor, so a new type EmptyDict isolates all the dealings with the internals of Dict.

""" A wrapper for directly constructing a Dict{K,V} with a sizehint """
struct EmptyDict{K,V} end
function EmptyDict{K,V}(n::Integer) where {K,V}
    n=Base._tablesz(n)
    Dict{K,V}(zeros(UInt8,n), Vector{K}(undef, n), Vector{V}(undef, n), 0, 0, 0, 1, 0)
end

Then I can write init3 with the speed and allocation savings of init2, but the clean syntax of init1.

init3(s) = Dict{Int64,Dict{Int64,Int128}}( i=>EmptyDict{Int64,Int128}(approx(i)) for i in s)

Topic		Replies	Views
Using sizehint! on a Dict that I would like to declare const General Usage	2	1062	March 15, 2018
When to use sizehint! for Dict? Performance	0	638	September 16, 2019
Sizehint! a Dict with a BigInt size not possible? General Usage question	8	555	October 27, 2020
How can I pre-allocate a dict? General Usage	5	698	April 4, 2023
Do comprehensions implicitly use sizehint? General Usage	3	766	June 17, 2020

Proper way to make an empty sizehinted Dict

Related topics