Sizehint! a Dict with a BigInt size not possible?

I’m using a dictionary to store some values indexed by integer partitions of 40. Since I know that there are 37338 of them, I attempted to sizehint! the dictionary, but instead of manually entering 37338, I used a function that computed that number as a BigInt, with the result that the sizehint! failed. As a minimal example, consider:

julia> d = Dict{Int64, Int64}()

julia> sizehint!(d, BigInt(37338))
ERROR: MethodError: no method matching leading_zeros(::BigInt)
Closest candidates are:
  leading_zeros(::Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8}) at int.jl:384
 [1] _tablesz(::BigInt) at ./abstractdict.jl:526
 [2] rehash!(::Dict{Int64,Int64}, ::BigInt) at ./dict.jl:179
 [3] sizehint!(::Dict{Int64,Int64}, ::BigInt) at ./dict.jl:242
 [4] top-level scope at REPL[39]:1

why can’t the sizehint! be specified as a BigInt?

is it possible to have memory larger than ~typemax(Int64) * 2 bytes?

in terms of implementation, sizehint!(array) is able to take a BigInt so maybe dictionary should take it too.

As a work around you can probably convert it back to Int after calculation? Since you don’t *actually need to hint a size that is beyond Int64’s capability

1 Like

By way of comparison, sizehint!(d,Int128(40)) works just fine, and I certainly can’t have memory approaching typemax(Int128) bytes, but we don’t require an explicit conversion in that case.

the issue is simply leading_zeros is only meaningful for BitInteger, for which BigInt is not one

So it’s just a matter of not having an efficient way to compute ceil(Int, log2(x)) when x::BigInt?

no, leading_zeros is describing the binary representation of a number, this is not well-defined for BigInt. Thus:

julia> bitstring(BigInt(1239))
ERROR: MethodError: no method matching bitstring(::BigInt)

But there’s no need to calculate the number of leading zeros, what the table needs is the number of digits trailing the leading zeros. It’s only used via its negation to determine the number of significant bits to the representation of x, in the calculation
_tablesz(x::Integer) = x < 16 ? 16 : one(x)<<((sizeof(x)<<3)-leading_zeros(x-1))
i.e. the table that can hold x items needs (sizeof(x)<<3)-leading_zeros(x-1) bits, so it can actually hold 2^(number of bits to represent x) items. But this agrees with the slower calculation
_tablesz(x::Integer) = x < 16 ? 16 : one(x)<<ceil(Int, log2(x))
so should abstractdict.jl also define
_tablesz(x::BigInt) = x<16 ? 16 : one(x)<<ceil(Int, log2(x))
or something more generic as a fallback?

not very familiar with this code but sounds like a possible route, you can open a PR maybe

You can do something like, then your sizehint! would have worked:

julia> import Base.sizehint!

julia> function sizehint!(d::Dict{T}, newsz::BigInt) where T
         newsz > 1000000 ? Error("sizehint! has suspiciously high value") : sizehint!(d, Int(newsz))

Feel free to make a PR on Dict with (or without?) some cut-off. I’m not going to make a PR, just note, you do not need to do anything about leading_zeros or _tablesz.

I tried on my machine with 128 GB of RAM, just note, allocating 119 on yours might be slow or likely crash the machine:

julia> @time sizehint!(d, 373380000)
  0.766696 seconds (9.99 k allocations: 17.001 GiB, 3.20% gc time, 1.62% compilation time)
Dict{Int64, Int64}()

julia> @time sizehint!(d, 3733800000)
 21.486587 seconds (3 allocations: 119.000 GiB, 0.62% gc time)
Dict{Int64, Int64}()
1 Like