Type instability clarification

Hi.

I’m currently investigating performance and type stability matters relating to a package (and learning along the way).

Here’s one type stability issue I don’t understand regarding a parametric type used in a variable length named tuple causing some instability when building a dictionary :

struct Node{T}
    i::Int
    c::T
end

test = (a = Node(1, 1),)
Main.@code_warntype Dict(p => Node(2, i.c) for (p,i) in pairs(test))

The output is :

MethodInstance for Dict(::Base.Generator{Base.Pairs{Symbol, Node{Int64}, Tuple{Symbol}, @NamedTuple{a::Node{Int64}}}, var"#352#353"})
  from Dict(kv) @ Base dict.jl:117
Arguments
  #self#::Type{Dict}
  kv::Base.Generator{Base.Pairs{Symbol, Node{Int64}, Tuple{Symbol}, @NamedTuple{a::Node{Int64}}}, var"#352#353"}
Locals
  #421::Base.var"#421#422"
Body::**Dict{Symbol, Node{Int64}}**
1 ─ %1 = Base.dict_with_eltype::Core.Const(Base.dict_with_eltype)
│   %2 = Base.:(var"#421#422")::Core.Const(Base.var"#421#422")
│        (#421 = %new(%2))
│   %4 = #421::Core.Const(Base.var"#421#422"())
│   %5 = Base.eltype(kv)::Core.Const(Any)
│   %6 = (%1)(%4, kv, %5)::Dict{Symbol, Node{Int64}}
└──      return %6

if, however, I replace the above named tuple by one with two (or more) elements, using a different type for the parametric type :

test = (a = Node(1, 1), b = Node(1, 1.0))

the output becomes :

Locals
  #421::Base.var"#421#422"
Body::**Dict{Symbol}**
1 ─ %1 = Base.dict_with_eltype::Core.Const(Base.dict_with_eltype)
│   %2 = Base.:(var"#421#422")::Core.Const(Base.var"#421#422")
│        (#421 = %new(%2))
│   %4 = #421::Core.Const(Base.var"#421#422"())
│   %5 = Base.eltype(kv)::Core.Const(Any)
│   %6 = (%1)(%4, kv, %5)::**Dict{Symbol}**
└──      return %6

And the Dict{Symbol} part is flagged as a type instability.

If instead of creating pairs from the named tuple, one decides to zip them instead :

@code_warntype Dict(p => Node(2, i.c) for (p,i) in zip(keys(test), values(test)))

I obtain :

MethodInstance for Dict(::Base.Generator{Base.Iterators.Zip{Tuple{Tuple{Symbol, Symbol, Symbol}, Tuple{Node{Int64}, Node{Float64}, Node{String}}}}, var"#392#393"})
  from Dict(kv) @ Base dict.jl:117
Arguments
  #self#::Type{Dict}
  kv::Base.Generator{Base.Iterators.Zip{Tuple{Tuple{Symbol, Symbol, Symbol}, Tuple{Node{Int64}, Node{Float64}, Node{String}}}}, var"#392#393"}
Locals
  #421::Base.var"#421#422"
Body::**Union{Dict{Symbol, Node}, Dict{Symbol, Node{Int64}}}**
1 ─ %1 = Base.dict_with_eltype::Core.Const(Base.dict_with_eltype)
│   %2 = Base.:(var"#421#422")::Core.Const(Base.var"#421#422")
│        (#421 = %new(%2))
│   %4 = #421::Core.Const(Base.var"#421#422"())
│   %5 = Base.eltype(kv)::Core.Const(Any)
│   %6 = (%1)(%4, kv, %5)::Union{Dict{Symbol, Node}, Dict{Symbol, Node{Int64}}}
└──      return %6

In all cases, the return type is either

Dict{Symbol, Node{Int64}}

or

Dict{Symbol, Node}

depending on whether there is only one element in the tuple or multiple ones.

My questions are the following :

  • Does anyone know what causes the type instability in the pairs case ? Why is the type ‘incomplete’ ? Is there some pitfall or obvious caveat I should be avoiding ?

  • In the zip case, how is the union inferred ? Is the first element assumed to be the ‘expected’ type, and after that the least general type encapsulating both the inserted elements and the next element becomes part of the union ?

  • I’m not sure whether I’m interpreting the output correctly, here, but the instability seems localised ? As in, the created dictionary ends up always having the same type.

  • Are there guidelines to avoid these kinds of type instability issues ? Some of the code in this package manipulates NamedTuples with a length and type only known at runtime, and trying to avoid type instability and unnecessary allocations is not always evident. Some of the articles on the topic I found didn’t discuss some of the more subtle issues I’m encountering.

2 Likes

Maybe I’m missing something but for me it looks like, the dict entries have different types hence the return type of the dict access depends on the value of the Symbol (:a or :b):

julia> test = (a = Node(1, 1), b = Node(1, 1.0))
(a = Node{Int64}(1, 1), b = Node{Float64}(1, 1.0))

julia> typeof(test[:a])
Node{Int64}

julia> typeof(test[:b])
Node{Float64}

One fix would be to change the type of the second entry of the Node to be the same:

julia> test = (a = Node(1, Float64(1.0)), b = Node(1, 1.0))
(a = Node{Float64}(1, 1.0), b = Node{Float64}(1, 1.0))

julia> typeof(test[:a])
Node{Float64}

julia> typeof(test[:b])
Node{Float64}
``

This issue is tied to NamedTuples on parametric types, so the type by which they are parametrised is allowed to differ, so unfortunately, no, the issue isn’t as simple as that.

The example above is just a toy example to attempt to have a simple exposition of my problem (but apologies if I wasn’t clear enough).

If you mean Dict{Symbol, Node}, then there’s no inherent reason for it to be “complete” when you provided a heterogenous collection (Node{Int} and Node{Float64}). It’s hypothetically possible to attempt to figure out Dict{Symbol, Union{Node{Int}, Node{Float64}}} for partial type inference benefits, but Base.Pairs is a subtype of AbstractDict and already settled on the “incomplete” parameter Node. The Dict constructor just adopts it.

julia> pairs((a = Node(1, 1), b = Node(1, 1.0))) |> typeof
Base.Pairs{Symbol, Node, Tuple{Symbol, Symbol}, @NamedTuple{a::Node{Int64}, b::Node{Float64}}}

The compiler’s Union-splitting handles a few return types from methods, not a few element types in collections. If you have a heterogenous collection, you generally need to manually infer and specify types for reliability because there’s no automatic algorithm that does what everyone needs. I have no idea how to do that for Base.Pairs, I’d rather do it via the Dict constructor.

It considers the possible concrete type for when you only do 1 iteration, then it does the general type. Note that the Iterators.Zip type didn’t settle on a Node parameter like Base.Pairs did.

Then the program is inherently type unstable. Type instability is an allowed and useful part of dynamically typed languages, it’s just avoided for better performance.

By incomplete, I was asking about the Dict{Symbol} output, which was unexpected to me. I’m less surprised by unions, or by Node being incomplete.

This is just a shorthand for Dict{Symbol, <:Any}. Note that in both the type-stable and type-unstable cases, we have this line:

which is the fallback eltype method that is incapable of inferring anything from a type’s parameters, so that’s not a great start. The 1-element case is saved by Base.dict_with_eltype and its internal @default_eltype finding that a concrete type is enough for all (1) elements. In the 2-element case, @default_eltype gave up for Any the moment it ran into a different concrete type, and the following recursive grow_to! for constructing the Dict becomes type-unstable.

Alright, I’m still puzzled as to why the return type seems to be identical despite the dictionary construction differing in each case, but those answers clear up some of my confusion.

It seems like there’s a fair bit one needs to know / keep in mind to avoid losing to retain as much type information as possible and avoid unnecessary runtime dispatch / memory allocations while still allowing for things like user-parametrised types.