Using `DataType` as key in `Dict`

Hi all,

Just a quick question: Is it a bad idea to use DataType as the key in a Dict if performance is reasonably important? So, something like this:

struct MyType ; end
d = Dict(Float64=>"a", MyType=>"b")

Using BenchmarkTools, recovering items from this dictionary is about 3 times slower than using Int for the key, which is fast enough for my application. But is there some other reason why this is a bad idea? I don’t really understand how DataType is represented “under-the-hood”, so thought it worth asking. Apologies if this is a stupid question.

EDIT: I’ve done a bit more thinking, and for the problem I’m working on it seems like a useful storage technique would be to map specific types to instances of that type. So something like this:

struct MyType1 ; x::Float64 ; end
struct MyType2 ; x::Float64 ; end
d = Dict(MyType1=>MyType1(1.0), MyType2=>MyType2(2.0))

Any thoughts on whether this is a terrible idea? BTW the reason I’m not just storing the instances in a Vector is that I want to be able to look up quickly whether a particular type is in the collection, and that would take much longer in a long vector than in a dictionary.

Cheers,

Colin

If this is what you need now, I don’t see any reason to avoid DataType keys. It will be slower than Int keys as you noted (hashing and equality testing is slower), but still reasonably fast, so you shouldn’t worry until this become a bottleneck.

2 Likes

You should probably check out IDDict. That should be much faster. (I think there might be some differences in results, but only for weird types)

1 Like

One example, where comparing by id and by isequal/hash differs is this:

julia> Tuple{Union{Int,UInt}} === Union{Tuple{Int},Tuple{UInt}}
false

julia> isequal(Tuple{Union{Int,UInt}}, Union{Tuple{Int},Tuple{UInt}})
true

That might be fine for your specific usecase, but depending on what you want to do with this, it could be important to treat these as the same type.

Edit:
Their hash is actually still different:

julia> hash(Tuple{Union{Int,UInt}}) == hash(Union{Tuple{Int},Tuple{UInt}})
false
3 Likes

That’s a bug.

4 Likes

Okay great, thanks for responding.

Interesting. Perhaps I’ve misunderstood the use-case of IdDict. I thought IdDict was a good choice when the key was large in terms of the number of bytes it occupies, like a long vector of numbers, since only the object id needs to be hashed, rather than the entire vector of numbers. But in this situation, my keys won’t be large since they are all DataType (I think instances of DataType are small but I don’t actually know for sure).

Having said that, it does look like there is a 20% speed-up on lookup using an IdDict using the following (admittedly very simple) example:

julia> using BenchmarkTools

julia> d1 = Dict(Float64=>"a", Int=>"b");

julia> d2 = IdDict(Float64=>"a", Int=>"b");

julia> @btime $d1[Int] ;
  15.056 ns (0 allocations: 0 bytes)

julia> @btime $d2[Int] ;
  12.176 ns (0 allocations: 0 bytes)

Fortunately I don’t think I need to worry about odd corner cases like this one.

Interesting though. And good job on finding a bug :slight_smile:

An IdDict doesn’t use the hash of the ID. it uses the ID itself. In addition to the semantic change (equality vs identity), this means that you don’t have to compute a hash value at all.

2 Likes

Ah I see. Thanks for the info it was helpful. I’ll probably end up using IdDict.

Well, the ID is a hash.