Why is hashing Dicts faster than Arrays?

bsuwal · February 10, 2021, 2:49am

I am suprised as to why hashing dicts is faster than arrays:

d = Dict{Int, Int}([i => i for i in 1:10])
n = 10000000
@btime begin
    for i = 1:n
        hash(d)
    end
end

gave me 1.457 s (39998979 allocations: 762.92 MiB)

while running this

arr = Vector{Int}([i for i in 1:10])
n = 10000000
@btime begin
    for i = 1:n
        hash(arr)
    end
end

gave me 2.325 s (39998979 allocations: 762.92 MiB)

This is particularly confusing to me because the inner representation of Dict allocates 3 Arrays: slots, keys and vals. If anything, I would have thought that Dicts should be way slower than Arrays, but even the number of allocations are the same?

Relatedly: What is being allocated?

Oscar_Smith · February 10, 2021, 3:15am

For longer Array, you should see the Array win out. What happens is that since we want all AbstractArrays that are equal to hash the same, we use a complicated method of hashing approximately log(length(arr)) values that is probably adding a bunch of overhead. It would probably be a good idea to have a fastpath for small Arrays that just hashes everything.

Topic		Replies	Views
Performance of Dictionaries.jl vs Base.Dict Internals & Design package , performance , dictionary , dictionaries	32	3206	December 13, 2024
Poor time performance on Dict? Performance	26	19066	March 12, 2018
Performance issues when working with dict Performance dictionary	11	1685	November 16, 2022
Custom hash function in dict is very slow? Performance	3	892	April 20, 2021
Dict with integer keys and fastest insertion? Performance dictionary , data_structures , hash , algorithm	8	248	January 25, 2025

Why is hashing Dicts faster than Arrays?

Related topics