Dict keys equality

Why are the keys of Dict not equal? Why do I need to actually collect them?

julia> a = Dict(Symbol(i) => i for i = 1:5)
Dict{Symbol,Int64} with 5 entries:
  Symbol("1") => 1
  Symbol("2") => 2
  Symbol("4") => 4
  Symbol("3") => 3
  Symbol("5") => 5

julia> b = Dict(Symbol(i) => i for i = 1:5)
Dict{Symbol,Int64} with 5 entries:
  Symbol("1") => 1
  Symbol("2") => 2
  Symbol("4") => 4
  Symbol("3") => 3
  Symbol("5") => 5

julia> keys(a) == keys(b)
false

julia> collect(keys(a)) == collect(keys(b))
true

Using @edit keys(a) == keys(b) we can see that this falls back to x === y which is false because they are not the same object. Defining a method

Base.:(==)(a::Base.KeyIterator, b::Base.KeyIterator) = a.dict == b.dict

should do the trick. Same could be applied to ValueIterator. Perhaps you could make a PR?

Defining the method to be equivalent to a.dict == b.dict would be confusing IMHO, since you would expect it to also work when dicts are not equal. The definition should be more involved: when dicts are not equal, it should check whether all keys are equal. I’m not sure this can be done efficiently, since the order is not defined, but isempty(symdiff(a, b)) would work.

Yes, sorry, I was confused about what the dict contained. It is a bit more involved indeed.

Yes I’ve been thinking about this one too - I think it’s useful to see if the keys of any two Associatives are the same up to ordering.

My two cents are that if collect(keys(a)) == collect(keys(b)) is true then keys(a) == keys(b) should also be true. I also might be missing the point.

If you call collect, then the order matters (so if two dictionaries had the same keys but stored internally in a different order, the keys would not be seen to be equal, while IMO opinion you usually ask this question when you just care that haskey returns the same for both collections, and not about the order)

2 Likes

So yes you are correct - but I think it should return true in a wider range of cases than that.

collect replaced with Set?

Well, Set is just a Dict with empty values, no? So we’re back at comparing keys.

I see…! yes, I agree with you.

I thought that by definition, if a Dict has the same keys then their order must also be the same…?

This?


function same_keys(dicts...)
    for d1 in dicts, k1 in keys(d1), d2 in setdiff(dicts, d1)
        (k1 in keys(d2)) || return false
    end
    return true
end

d1 = Dict(:a => rand(), :b => rand())
d2 = Dict(:a => rand(), :b => rand())
d3 = Dict(:a => rand(), :b => rand())
d4 = Dict(:a => rand(), :b => rand(), :r => rand())
same_keys(d1, d2, d3, d4)

I suspect it’s faster to compare the lengths of the dictionaries rather than testing set membership in both directions. I’d propose something like this:

Base.:(==)(a::Base.KeyIterator, b::Base.KeyIterator) = length(a)==length(b) && all(k->in(k,b), a)
2 Likes

Yep, that would be really useful.

However, note that Dict is just one type of Associative and sometimes the question will be asked is: do two Associatives of different types have the same set of keys? Not sure that keys(a) == keys(b) could ever cover all cases if some Associatives just store their keys as a vector or a tuple, say.

One solution would be to make all the different KeyIterator objects subtypes of AbstractSet and make sure that they all have length, in, etc. Then it would be possible to define a general method for ==(a::AbstractSet, b::AbstractSet), if that does not exist already.

(It might be necessary to write faster methods for special cases to get reasonable performance, but that’s a separate issue.)

This makes sense to me. Why wouldn’t keys(dict) be an AbstractSet? It implements the entire (immutable) Set interface in a reasonable complexity. Sets are basically implemented as a view over the keys of a dictionary anyway.

Though an edge case, this solution may not always work:

julia> Base.:(==)(a::Base.KeyIterator, b::Base.KeyIterator) = length(a)==length(b) && all(k->in(k,b), a)

julia> p = [1]
1-element Array{Int64,1}:
 1

julia> a = ObjectIdDict(p=>1, [1]=>1)
ObjectIdDict with 2 entries:
  [1] => 1
  [1] => 1

julia> b = Dict(p=>1, [2]=>1)
Dict{Array{Int64,1},Int64} with 2 entries:
  [2] => 1
  [1] => 1

julia> keys(a) == keys(b)
true

julia> keys(b) == keys(a)
false

We may need to better treat the equation operation if we don’t restrict the definition to KeyIterator{Dict{T, S}}

1 Like