Tell if a key is present in a dict

This is a very beginner question. I would like to check if a key exists in a dictionary.

julia> 1 in Dict(1=>'a', 2=>'b')
ERROR: AbstractDict collections only contain Pairs;
Either look for e.g. A=>B instead, or use the `keys` or `values`
function if you are looking for a key or value respectively.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] in(::Int64, ::Dict{Int64,Char}) at ./abstractdict.jl:28
 [3] top-level scope at REPL[9]:1

This makes sense, a dictionary contains Pairs and not keys.

I can do this:

julia> 1 in keys(Dict(1=>'a', 2=>'b'))
true

But I think keys() returns an iterator, and the in operator iterates over all keys until it finds the key. AFAIK iterators only support sequential iteration. Using a linear search to find a key in a hashmap is crazy.

So how can I tell if a key is present in a Dict?

There is haskey, as in:

julia> d = Dict(1 => 10)
Dict{Int64,Int64} with 1 entry:
  1 => 10

julia> haskey(d, 1)
true

julia> haskey(d, 2)
false
10 Likes

Use the haskey function.

2 Likes

This is actually efficient because keys returns a KeysIterator which is a type of set and has efficient O(1) containment checking.

13 Likes

I thought @StefanKarpinski comment was interesting… and it looks like the “in” form is faster than haskey…

Please tell me what I did wrong in this microbenchmark :wink:
edit: and i found my mistake almost as soon as i posted… new code.

function f1(d, l)
    x = true
    for i=1:length(l)
        x = x && haskey(d, l[i])
    end
    x
end

function f2(d, l)
    x = true
    for i=1:length(l)
        x = x && l[i] in keys(d)
    end
    x
end

function main()
    n = 500000
    d = Dict{Int, Char}()
    for i=1:n
        d[rand(1:n)] = Char(rand(Int('a'):Int('z')))
    end
    # be sure and include some look-ups that will fail.
    l = rand(1:n+1000, n)
    @time f1(d, l)
    @time f2(d, l)
end

julia> main()
  0.000234 seconds
  0.000233 seconds
false

julia> main()
  0.000234 seconds
  0.000154 seconds
false

julia> main()
  0.000233 seconds
  0.000117 seconds
false

julia> main()
  0.000233 seconds
  0.000117 seconds
false

julia> main()
  0.000233 seconds
  0.000117 seconds
false

That is very good to know. I’m a newcomer to Julia and multi dispatch, and I don’t know how to look for documentation. In a single-dispatch OOP language, I could simply look up the method of a class (and the class would already be given). But with Julia, I find it hard to find documentation for things. For example, the keys function has this documentation ( https://docs.julialang.org/en/v1/base/collections/#Base.keys )

For an iterator or collection that has keys and values (e.g. arrays and dictionaries), return an iterator over the keys.

It does not (and probably cannot?) tell what kind of iterator is returned, because it depends on the various methods, and they can be added by external packages. I can only be sure about one thing: it returns an iterator. I already read most the manual and I can recall that iteration was described at “Interfaces” ( https://docs.julialang.org/en/v1/manual/interfaces/# ) but it also does not define what an iterator is. Of course, there are examples showing iterations over various data types, but that is not a definition.

I can also see that there are multiple methods (by looking at methods(keys)) but that doesn’t help either, because it is just a list of line numbers in the source code, and they don’t have method-specific documentation (or at least I don’t know where).

How am I supposed to know, that the iterator of a Dict is a KeysIterator and that it can be efficiently tested for key containment? What is a KeysIterator anyway? I just tried to search for it in the official docs, but nothing was found.

After all, I remain uncertain and I have to ask lots of questions here. I’m constantly asking for help and that is uncomfortable. Am I using the documentation the wrong way? Is there a better way to use it?

2 Likes

Try to change the order of f1 and f2 in your benchmark

this is partly mitigated by methodswith but everyone agrees with your statement to some degree for sure.

Also btw, the first thing mentioned in https://docs.julialang.org/en/v1/base/collections/#Iterable-Collections-1

is

To test for the presence of a key in a dictionary, use haskey or k in keys(dict) .


at this point I think iterator has less to do with Julia but just a general nomenclature in programming,

compare to this python docs about iterator, which doesn’t even have a for loop (and its equivalence) example
https://docs.python.org/3/c-api/iterator.html

2 Likes

That is the C API. Iterator protocol is documented here: https://docs.python.org/3/library/stdtypes.html#iterator-types#iterator-types but I’m being offtopic.

1 Like

Perhaps this older blog post would be of interest to you: https://julialang.org/blog/2018/07/iterators-in-julia-0.7/

Ultimately, it almost doesn’t matter that iterating over keys gives you a KeysIterator. Highly unlikely that type should actually appear in your code, even if your code makes use of it. The keys functions returns an iterator, iterators in Julia adhere to a common interface, and things “Just Work” if you don’t overtype your code.

2 Likes

I agree with your comments here that sometimes the documentation is sparse. It’s a rapidly evolving project where the main Julia folks are often busy with more “serious” work like threading support and compiler worker. I suggest that if you have learned something from this discourse thread, that you may submit a PR to the documentation in the relevant section.

Offtopic: I remember that a long time ago MSDN used to have a “comments/snippets” section at the end of their doc pages where folks would submit snippets of code on the particular topic. I think something like that for Julia would be great. Maybe not particularly relevant to haskey or Dicts but can be extremely useful for packages like Query.

1 Like

Good idea. I did and the run time stayed the same ??!! The second function seems to run faster regardless if it’s f1 or f2.

The part in the manual about the iterator interface is the definition of an iterator — anything that implements that interface is an iterator.

You don’t actually need to know that to use keys(dict). The point is to abstract away these implementation details. ?keys documents the actual usage — it even has a sentence about Dict.

To be fair, the documentation can always be improved. But it is frequently useful to look at the “verbs” (functions) like keys in addition to the “nouns” (types).

It is not easy to adjust from an OOP language to multimethods, especially if you are very experienced with OOP. Just give it time.

1 Like

Use @btime from BenchmarkTools.jl for microbenchmarks like this instead of @time.

1 Like

Take my example, replace @time with @btime and got a very, very ugly stacktrace out of the attempt.

That’s the problem with macros, very hard to figure out what’s going on.

I read through the manual and saw,

A good rule of thumb is that external variables should be explicitly interpolated into the benchmark expression

so i used

@btime f1($d, $l)

and that fixed the problem, and now f1/f2 give the same run time

julia> main()
116.587 μs (0 allocations: 0 bytes)
116.584 μs (0 allocations: 0 bytes)
false

3 Likes