You can easily use hash(i)=i
in Julia as well by defining a wrapper type for the keys, and when I do so I find that performance in the original benchmark is increased by almost a factor of 5, which makes it almost 3x faster than Python for me (in Julia 0.6):
julia> struct FastHashInt{T<:Integer}; i::T; end
julia> Base.:(==)(x::FastHashInt, y::FastHashInt) = x.i == y.i
Base.hash(x::FastHashInt, h::UInt) = xor(UInt(x.i), h)
julia> function dict_perf()
n = 10^7
x = Dict{Int,Int}()
sizehint!(x, n)
for i = 1:n
x[i] = i
end
return x
end
dict_perf (generic function with 1 method)
julia> @time dict_perf(); @time dict_perf(); @time dict_perf();
1.754449 seconds (1.48 k allocations: 272.081 MiB, 4.54% gc time)
1.756465 seconds (11 allocations: 272.001 MiB, 4.38% gc time)
1.715037 seconds (11 allocations: 272.001 MiB, 1.10% gc time)
julia> function dict_perf2()
n = 10^7
x = Dict{FastHashInt{Int},Int}()
sizehint!(x, n)
for i = 1:n
x[FastHashInt(i)] = i
end
return x
end
dict_perf2 (generic function with 1 method)
julia> @time dict_perf2(); @time dict_perf2(); @time dict_perf2();
0.376183 seconds (1.37 k allocations: 272.073 MiB, 5.45% gc time)
0.355044 seconds (11 allocations: 272.001 MiB, 3.26% gc time)
0.350325 seconds (11 allocations: 272.001 MiB, 2.91% gc time)
The Python 2 version of this takes 1 second on my machine:
In [1]: def dict_performance():
dic = dict()
for i in xrange(10000000):
dic[i] = i
In [2]: %time dict_performance();
CPU times: user 873 ms, sys: 188 ms, total: 1.06 s
Wall time: 1.06 s
(1.13 seconds in Python 3.) Update: In Julia 0.7, I get an additional factor of 2 speedup (0.18sec for dict_perf2
).
A moral of this story is that in cases where dictionary performance is critical, you might want a custom hashing function optimized for your use-case. The good news is that this is possible in pure Julia code.