Performance of Dictionaries.jl vs Base.Dict

Lots of places; Dictionaries.jl’s packages are just much faster than Base Dicts to iterate over. There’s a reason most package designers use them rather than Dicts.

2 Likes

I would be very interested in seeing a direct benchmark showing that Dict is five times slower than Dictionaries.

3 Likes

Using my benchmark above, it seems that Dictionary is slower than Dict for getindex (mul1! benchmark), but faster for iterating over pairs(dict) (mul2! benchmark).

In particular, this code:

Code
function mul1!(y, D, x)
    for i = 1:length(y)
        s = zero(eltype(y))
        @inbounds for j = 1:length(x)
            s += D[CartesianIndex((i,j))] * x[j]
        end
        y[i] = s
    end
    return y
end

function mul2!(y, D, x)
    y .= 0
    for (k,v) in pairs(D)
        i,j = Tuple(k)
        @inbounds y[i] += v * x[j]
    end
    return y
end

using Dictionaries, BenchmarkTools
A = rand(1000,1000)
x = rand(1000)
y = similar(x)
D = Dict(pairs(A))
D2 = Dictionary(D)

gives:

julia> @btime mul1!($y, $D, $x); @btime mul1!($y, $D2, $x);
  67.968 ms (0 allocations: 0 bytes)
  110.422 ms (0 allocations: 0 bytes)

julia> @btime mul2!($y, $D, $x); @btime mul2!($y, $D2, $x);
  9.907 ms (0 allocations: 0 bytes)
  1.897 ms (0 allocations: 0 bytes)

If you look at the way Dictionary is implemented, it has separate contiguous arrays of the keys and values in the dictionary, which makes it fast to iterate, at the expense of an additional indirection during getindex (via a separate array of indices into the contiguous data).

12 Likes

cc @andyferris

Yes. I think Andy writes about this in several places.
Like in Getting Started in the README.

The three main difference to Dict are that it preserves the order of elements, it iterates much faster, and it iterates values rather than key-value pairs.

His JuliaCon talk is very clear.

If you do a lot of insertion and deletion and not much iterating over keys and values, then Dict might be faster. If you do a lot of iterating, and not so much insertion and deletion, then Dictionary

13 Likes

Ah yes, that’s a known trade off for ordered dicts—iteration gets much faster but other operations get a bit slower. Describing that as “Dictionaries are five times faster than Dict” is a bit misleading.

14 Likes

Dictonaries.jl also has the tokens API which helps with performance.

yeah, we really need to add the token API to base.

4 Likes

For a less-gifted individual like me, can you offer a basic example of how to use tokens and how they help?

The idea of a token is that a lot of the time people will write code like the following

D = get_random(dict)
if !(a in keys(D))
    D[a] = 1
end

This looks decent, but requires doing a duplicate lookup in D. Using a token will save a call to hash(a) and may save some lookup costs.

5 Likes

There is get! for that pattern. It’s more when you want to update an existing index D[a] += 1 where I don’t think there is a good way to do it now without double lookup.

7 Likes

get! can be used for my case, but since it isn’t lazy it sometimes isn’t a good option.

There’s a lazy method where the value is computed by a function call.

3 Likes

AFAIK there is no great way to se a Base.Dict to implement a “lazy accumulator”, like this:

   if haskey(dict, key)
    dict[key] += 1
  else
    dict[key] = 1
  end

This needs three lookups. We can reduce this to two via get!:

    dict[key] = 1 + get!(dict, key, 0)

But I am not aware of way to get it down to 1 (if there is one, I’d love to learn about it).

5 Likes

Add modify! function for lookup/update/insert/delete in one go by tkf · Pull Request #33758 · JuliaLang/julia · GitHub enables this, and to my understanding is a reason why a token-based API doesn’t currently exist in Base. Maybe the current thread will reignite efforts on landing one of the two approaches.

3 Likes

Neither was I… but now I think the following incrementkey! does it:

using OrderedCollections

incrementkey!(d, k) = ( mergewith!(+, d, LittleDict((k,), (1,))) ;nothing)
classic_incrementkey!(d, k) = ( d[k] = 1 + get!(d, k, 0) ;nothing)

A little test:

julia> using BenchmarkTools

julia> d = Dict(4=>5, 3=>0, 1=>2);

julia> @btime incrementkey!($d, 3)
  10.763 ns (0 allocations: 0 bytes)

julia> @btime classic_incrementkey!($d, 3)
  14.533 ns (0 allocations: 0 bytes)

The last time I looked into it (a long time ago), you have to go behind the API for Dict. Whereas (as discussed above) avoiding looking up twice is part of the API for Dictionary. I wrote some things towards giving Dict and Dictionary a common interface here DictTools.jl. In one place, it avoids computing the hash twice like this

@inline function update!(dict::AbstractDictionary{T, V}, _key::T, func::F, default) where {F, T, V}
    (hasval, token) = gettoken(dict, _key)
    if hasval
        settokenvalue!(dict, token, func(gettokenvalue(dict, token)))
    else
        insert!(dict, _key, default)
    end
end


# Stolen from StatsBase. This is faster than naive method
@inline function update!(dict::Dict{T, V}, _key::T, func::F, default) where {F, T, V}
    index = Base.ht_keyindex2!(dict, _key)
    if index > 0
        @inbounds dict.vals[index] = func(dict.vals[index])
    else
        @inbounds Base._setindex!(dict, default, _key, -index)
    end
end

After using DictTools.jl, I think that in general trying to use a common interface is too much trouble.

But one of the main reasons that Andy wrote Dictionaries is to provide an associative array with an AbstractArray-like interface. I have found this useful. It is much easier to write a generic function that will take either a Vector or a Dictionary.

I first started DictTools for a couple of things: Provide a count map for Dict that accepts an iterator, rather than just a AbstractVector. Provide a count map for Dictionary. It’s natural to try to abstract part of this away with a common interface to the two implementations of dictionaries. And then to support Vector as well.

Fwiw, adding a token API to Dict would be a reasonable thing. Someone just need to make a proposal/implementation. Could be based on Dictionaries.

4 Likes

Agreed - there’s a lot of AbstractDict API work that could happen in Base without breaking changes.

6 Likes

A post was split to a new topic: Dictionary.jl’s token API