Document / export `ht_keyindex()` from dict.jl?

I’d like to avoid repeated lookup of the memory address of some key in a dictionary. Consider the following code:

function delete_if_even!(d::Dict, k)
    if haskey(d, k) && iseven(d[k])
        delete!(d, k)
    else
        d
    end
end

Since a dictionary is implemented as a hash table, the hash computation happens 3 times in the above code, in haskey, getindex, and delete!. The repeated computation can be avoid by rewriting the above function as follows, directly using ht_keyindex() from dict.jl,

function delete_if_even_fast!(d::Dict, k)
    index = Base.ht_keyindex(d, k)
    @inbounds index >= 0 && iseven(d.vals[index]) ? Base._delete!(d, index) : d # return d if nothing is done
end

In fact, a similar functionality is exported by the C++ stdlib unordered_map. An iterator for the memory address of a key can be located by calling find(), then you can directly delete a key using the iterator by calling erase(). In Julia, the iterator of a dictionary is usually only produced by iterate() unless you dig into unexported parts of dict.jl. Or maybe there’s a more elegant solution to the problem?

Have a look at Dictionaries.jl: GitHub repo, Announcement post with some discussion.

2 Likes

Thanks, it looks neat. But for my use case (heavy insertion / deletion element by element rather than broadcast), the base Dict still seems faster according to my very limited testing. It would be nice if a few more methods are exported by dict.jl, so I can improve my existing code with minimal changes, without relying on undocumented internal details.

3 Likes

This topic has been discussed several times in Julia language issues, and there is an open PR to address it here: Add modify! function for lookup/update/insert/delete in one go by tkf · Pull Request #33758 · JuliaLang/julia · GitHub

3 Likes

Yeah this is correct.

For the die-hards, in Dictionaries.jl there is a module under contrib/HashDictionaries.jl with a dictionary like the one in Base. It is slightly faster than Dict because I removed the extra code used to support WeakKeyDict, and you can use the Dictionaries.jl token interface (and helper functions) with it. The plan is to turn this into UnorderedDictionary, export it, and document it as superior for this use case.

5 Likes

To follow up on that, the new Dictionaries.jl 0.3.23 has an UnorderedDictionary dictionary type (and UnorderedIndices set type) for insertion/deletion/lookup heavy workloads. It would be good to know how it performs for people in the wild.

5 Likes