I’m using a dictionary to store some values indexed by integer partitions of 40. Since I know that there are 37338 of them, I attempted to sizehint! the dictionary, but instead of manually entering 37338, I used a function that computed that number as a BigInt, with the result that the sizehint! failed. As a minimal example, consider:
julia> d = Dict{Int64, Int64}()
Dict{Int64,Int64}()
julia> sizehint!(d, BigInt(37338))
ERROR: MethodError: no method matching leading_zeros(::BigInt)
Closest candidates are:
leading_zeros(::Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8}) at int.jl:384
Stacktrace:
[1] _tablesz(::BigInt) at ./abstractdict.jl:526
[2] rehash!(::Dict{Int64,Int64}, ::BigInt) at ./dict.jl:179
[3] sizehint!(::Dict{Int64,Int64}, ::BigInt) at ./dict.jl:242
[4] top-level scope at REPL[39]:1
is it possible to have memory larger than ~typemax(Int64) * 2 bytes?
in terms of implementation, sizehint!(array) is able to take a BigInt so maybe dictionary should take it too.
As a work around you can probably convert it back to Int after calculation? Since you don’t *actually need to hint a size that is beyond Int64’s capability
By way of comparison, sizehint!(d,Int128(40)) works just fine, and I certainly can’t have memory approaching typemax(Int128) bytes, but we don’t require an explicit conversion in that case.
But there’s no need to calculate the number of leading zeros, what the table needs is the number of digits trailing the leading zeros. It’s only used via its negation to determine the number of significant bits to the representation of x, in the calculation _tablesz(x::Integer) = x < 16 ? 16 : one(x)<<((sizeof(x)<<3)-leading_zeros(x-1))
i.e. the table that can hold x items needs (sizeof(x)<<3)-leading_zeros(x-1) bits, so it can actually hold 2^(number of bits to represent x) items. But this agrees with the slower calculation _tablesz(x::Integer) = x < 16 ? 16 : one(x)<<ceil(Int, log2(x))
so should abstractdict.jl also define _tablesz(x::BigInt) = x<16 ? 16 : one(x)<<ceil(Int, log2(x))
or something more generic as a fallback?
You can do something like, then your sizehint! would have worked:
julia> import Base.sizehint!
julia> function sizehint!(d::Dict{T}, newsz::BigInt) where T
newsz > 1000000 ? Error("sizehint! has suspiciously high value") : sizehint!(d, Int(newsz))
end
Feel free to make a PR on Dict with (or without?) some cut-off. I’m not going to make a PR, just note, you do not need to do anything about leading_zeros or _tablesz.
I tried on my machine with 128 GB of RAM, just note, allocating 119 on yours might be slow or likely crash the machine: