Using `DataType` as key in `Dict`

colintbowers · October 22, 2020, 5:09am

Hi all,

Just a quick question: Is it a bad idea to use DataType as the key in a Dict if performance is reasonably important? So, something like this:

struct MyType ; end
d = Dict(Float64=>"a", MyType=>"b")

Using BenchmarkTools, recovering items from this dictionary is about 3 times slower than using Int for the key, which is fast enough for my application. But is there some other reason why this is a bad idea? I don’t really understand how DataType is represented “under-the-hood”, so thought it worth asking. Apologies if this is a stupid question.

EDIT: I’ve done a bit more thinking, and for the problem I’m working on it seems like a useful storage technique would be to map specific types to instances of that type. So something like this:

struct MyType1 ; x::Float64 ; end
struct MyType2 ; x::Float64 ; end
d = Dict(MyType1=>MyType1(1.0), MyType2=>MyType2(2.0))

Any thoughts on whether this is a terrible idea? BTW the reason I’m not just storing the instances in a Vector is that I want to be able to look up quickly whether a particular type is in the collection, and that would take much longer in a long vector than in a dictionary.

Cheers,

Colin

rfourquet · October 22, 2020, 8:46am

If this is what you need now, I don’t see any reason to avoid DataType keys. It will be slower than Int keys as you noted (hashing and equality testing is slower), but still reasonably fast, so you shouldn’t worry until this become a bottleneck.

Oscar_Smith · October 22, 2020, 12:23pm

You should probably check out IDDict. That should be much faster. (I think there might be some differences in results, but only for weird types)

simeonschaub · October 22, 2020, 12:32pm

One example, where comparing by id and by isequal~~/hash~~ differs is this:

julia> Tuple{Union{Int,UInt}} === Union{Tuple{Int},Tuple{UInt}}
false

julia> isequal(Tuple{Union{Int,UInt}}, Union{Tuple{Int},Tuple{UInt}})
true

That might be fine for your specific usecase, but depending on what you want to do with this, it could be important to treat these as the same type.

Edit:
Their hash is actually still different:

julia> hash(Tuple{Union{Int,UInt}}) == hash(Union{Tuple{Int},Tuple{UInt}})
false

yuyichao · October 22, 2020, 7:26pm

That’s a bug.

colintbowers · October 23, 2020, 12:38am

Okay great, thanks for responding.

colintbowers · October 23, 2020, 12:43am

Interesting. Perhaps I’ve misunderstood the use-case of IdDict. I thought IdDict was a good choice when the key was large in terms of the number of bytes it occupies, like a long vector of numbers, since only the object id needs to be hashed, rather than the entire vector of numbers. But in this situation, my keys won’t be large since they are all DataType (I think instances of DataType are small but I don’t actually know for sure).

Having said that, it does look like there is a 20% speed-up on lookup using an IdDict using the following (admittedly very simple) example:

julia> using BenchmarkTools

julia> d1 = Dict(Float64=>"a", Int=>"b");

julia> d2 = IdDict(Float64=>"a", Int=>"b");

julia> @btime $d1[Int] ;
  15.056 ns (0 allocations: 0 bytes)

julia> @btime $d2[Int] ;
  12.176 ns (0 allocations: 0 bytes)

colintbowers · October 23, 2020, 12:45am

Fortunately I don’t think I need to worry about odd corner cases like this one.

Interesting though. And good job on finding a bug

Oscar_Smith · October 23, 2020, 12:46am

An IdDict doesn’t use the hash of the ID. it uses the ID itself. In addition to the semantic change (equality vs identity), this means that you don’t have to compute a hash value at all.

colintbowers · October 23, 2020, 3:57am

Ah I see. Thanks for the info it was helpful. I’ll probably end up using IdDict.

yuyichao · October 23, 2020, 5:53am

Well, the ID is a hash.

Topic		Replies	Views
Performance of IdDict vs Dict Performance	13	1459	May 19, 2023
`IdDict{UInt64, Float64}` is much slower than `Dict{UInt64, Float64}` for retrieving values Performance dictionary	2	110	November 27, 2024
What is the fastest data type out of these? General Usage question	18	1240	October 1, 2018
Performance issues when working with dict Performance dictionary	11	1697	November 16, 2022
Performance of using strings as keys in Dict (vs. integers) General Usage question	8	4796	July 24, 2017

Using `DataType` as key in `Dict`

Related topics