Poor time performance on Dict?

You can easily use hash(i)=i in Julia as well by defining a wrapper type for the keys, and when I do so I find that performance in the original benchmark is increased by almost a factor of 5, which makes it almost 3x faster than Python for me (in Julia 0.6):

julia> struct FastHashInt{T<:Integer}; i::T; end

julia> Base.:(==)(x::FastHashInt, y::FastHashInt) = x.i == y.i
       Base.hash(x::FastHashInt, h::UInt) = xor(UInt(x.i), h)

julia> function dict_perf()
           n = 10^7
           x = Dict{Int,Int}()
           sizehint!(x, n)
           for i = 1:n
               x[i] = i
           end
           return x
       end
dict_perf (generic function with 1 method)

julia> @time dict_perf(); @time dict_perf(); @time dict_perf();
  1.754449 seconds (1.48 k allocations: 272.081 MiB, 4.54% gc time)
  1.756465 seconds (11 allocations: 272.001 MiB, 4.38% gc time)
  1.715037 seconds (11 allocations: 272.001 MiB, 1.10% gc time)

julia> function dict_perf2()
           n = 10^7
           x = Dict{FastHashInt{Int},Int}()
           sizehint!(x, n)
           for i = 1:n
               x[FastHashInt(i)] = i
           end
           return x
       end
dict_perf2 (generic function with 1 method)

julia> @time dict_perf2(); @time dict_perf2(); @time dict_perf2();
  0.376183 seconds (1.37 k allocations: 272.073 MiB, 5.45% gc time)
  0.355044 seconds (11 allocations: 272.001 MiB, 3.26% gc time)
  0.350325 seconds (11 allocations: 272.001 MiB, 2.91% gc time)

The Python 2 version of this takes 1 second on my machine:

In [1]: def dict_performance():
    dic = dict()
    for i in xrange(10000000):
        dic[i] = i

In [2]: %time dict_performance();
CPU times: user 873 ms, sys: 188 ms, total: 1.06 s
Wall time: 1.06 s

(1.13 seconds in Python 3.) Update: In Julia 0.7, I get an additional factor of 2 speedup (0.18sec for dict_perf2).

A moral of this story is that in cases where dictionary performance is critical, you might want a custom hashing function optimized for your use-case. The good news is that this is possible in pure Julia code.

24 Likes