Why is hashing Dicts faster than Arrays?

I am suprised as to why hashing dicts is faster than arrays:

d = Dict{Int, Int}([i => i for i in 1:10])
n = 10000000
@btime begin
    for i = 1:n
        hash(d)
    end
end

gave me 1.457 s (39998979 allocations: 762.92 MiB)

while running this

arr = Vector{Int}([i for i in 1:10])
n = 10000000
@btime begin
    for i = 1:n
        hash(arr)
    end
end

gave me 2.325 s (39998979 allocations: 762.92 MiB)

This is particularly confusing to me because the inner representation of Dict allocates 3 Arrays: slots, keys and vals. If anything, I would have thought that Dicts should be way slower than Arrays, but even the number of allocations are the same?

Relatedly: What is being allocated?

2 Likes

For longer Array, you should see the Array win out. What happens is that since we want all AbstractArrays that are equal to hash the same, we use a complicated method of hashing approximately log(length(arr)) values that is probably adding a bunch of overhead. It would probably be a good idea to have a fastpath for small Arrays that just hashes everything.

3 Likes