How can I get a fully-specified type of a `Dict`?

Hi all,

I’m working in a package where the user provides a Dict that is never changed by my package. While I know the structure of the entries, I do not know the which types it contains. Is there a way to “convert” a poorly typed dict in a fully specified one for optimal performance?

A simplified example would look like this. A user provides d1:

d1 = Dict{Any, Vector}()
d1[3] = [(1, "aaa"), (2, "aaa")]
d1[5] = [(1, "aaa"), (2.4, :bbb), (5.4, :bbb)]

d1 is not fully specified. The fully-specified version of it is much faster but harder to write and less convenient for the user:

d2 = Dict{Int,
          Vector{Union{Tuple{Int, String},
                       Tuple{Float64, Symbol}}}}()  # a rather complicated type
d2[3] = [(1, "aaa"), (2, "aaa")]
d2[5] = [(1, "aaa"), (2.4, :bbb), (5.4, :bbb)]

So I’m looking for something that translates d1 into d2:

d2 = make_stable_immutable_dict(d1) # <- no idea how to do this

I’d be happy for any hints and ideas!

1 Like

Your first issue is gonna be that the type of your values is not fully inferred. This is an optimization sometimes done by Julia instead of dealing with weird Unions:

julia> [(1, "aaa"), (2.4, :bbb), (5.4, :bbb)]
3-element Vector{Tuple{Real, Any}}:
 (1, "aaa")
 (2.4, :bbb)
 (5.4, :bbb)

But if you solve that, broadcasting the identity mapping is a good way to refine type inference:

julia> x = Any[1, 2.0]
2-element Vector{Any}:
 1
 2.0

julia> identity.(x)
2-element Vector{Real}:
 1
 2.0

With a dictionary, here’s how it would work:

julia> Dict(identity.(keys(d1)) .=> identity.(values(d1)))
Dict{Int64, Vector} with 2 entries:
  5 => Tuple{Real, Any}[(1, "aaa"), (2.4, :bbb), (5.4, :bbb)]
  3 => [(1, "aaa"), (2, "aaa")]
4 Likes

As a generalization of @gdalle’s answer, I sometimes find it useful to do this recursively:

tighten(x) = x
tighten(d::Dict) = Dict(tighten(key) => tighten(value) for (key, value) in d)
tighten(v::Vector) = tighten.(v)

It gives the same answer in your example:

julia> d2 = tighten(d1)
Dict{Int64, Vector} with 2 entries:
  5 => Tuple{Real, Any}[(1, "aaa"), (2.4, :bbb), (5.4, :bbb)]
  3 => [(1, "aaa"), (2, "aaa")]

but if there were more deeply nested data structures, it could have made a difference. This is particularly useful when deserializing data from a format that loses type information, such as JSON.

3 Likes

Thanks a lot @gdalle and @ffevotte!
The recursive version is I exactly what would need in the real case.

Is there a way to force Julia to infer the types with Unions?

I can see that the would be too costly in general, but for my application the manual version with Unions is much faster.

If your code needs to do a lot of work for each entry of the Dict, then consider using a function barrier like this:

   for (k,v) in d1
      do_the_work(k,v)
   end

and then later

   function do_the_work(k,v)
   ...
   end

The main routine still needs to pay the price for deciphering the types of k and v at run-time. But do_the_work will get specialized and compiled separately for each different type of k and v.

Open a new topic for that IMO.

1 Like