Hi.
I’m currently investigating performance and type stability matters relating to a package (and learning along the way).
Here’s one type stability issue I don’t understand regarding a parametric type used in a variable length named tuple causing some instability when building a dictionary :
struct Node{T}
i::Int
c::T
end
test = (a = Node(1, 1),)
Main.@code_warntype Dict(p => Node(2, i.c) for (p,i) in pairs(test))
The output is :
MethodInstance for Dict(::Base.Generator{Base.Pairs{Symbol, Node{Int64}, Tuple{Symbol}, @NamedTuple{a::Node{Int64}}}, var"#352#353"})
from Dict(kv) @ Base dict.jl:117
Arguments
#self#::Type{Dict}
kv::Base.Generator{Base.Pairs{Symbol, Node{Int64}, Tuple{Symbol}, @NamedTuple{a::Node{Int64}}}, var"#352#353"}
Locals
#421::Base.var"#421#422"
Body::**Dict{Symbol, Node{Int64}}**
1 ─ %1 = Base.dict_with_eltype::Core.Const(Base.dict_with_eltype)
│ %2 = Base.:(var"#421#422")::Core.Const(Base.var"#421#422")
│ (#421 = %new(%2))
│ %4 = #421::Core.Const(Base.var"#421#422"())
│ %5 = Base.eltype(kv)::Core.Const(Any)
│ %6 = (%1)(%4, kv, %5)::Dict{Symbol, Node{Int64}}
└── return %6
if, however, I replace the above named tuple by one with two (or more) elements, using a different type for the parametric type :
test = (a = Node(1, 1), b = Node(1, 1.0))
the output becomes :
Locals
#421::Base.var"#421#422"
Body::**Dict{Symbol}**
1 ─ %1 = Base.dict_with_eltype::Core.Const(Base.dict_with_eltype)
│ %2 = Base.:(var"#421#422")::Core.Const(Base.var"#421#422")
│ (#421 = %new(%2))
│ %4 = #421::Core.Const(Base.var"#421#422"())
│ %5 = Base.eltype(kv)::Core.Const(Any)
│ %6 = (%1)(%4, kv, %5)::**Dict{Symbol}**
└── return %6
And the Dict{Symbol} part is flagged as a type instability.
If instead of creating pairs from the named tuple, one decides to zip them instead :
@code_warntype Dict(p => Node(2, i.c) for (p,i) in zip(keys(test), values(test)))
I obtain :
MethodInstance for Dict(::Base.Generator{Base.Iterators.Zip{Tuple{Tuple{Symbol, Symbol, Symbol}, Tuple{Node{Int64}, Node{Float64}, Node{String}}}}, var"#392#393"})
from Dict(kv) @ Base dict.jl:117
Arguments
#self#::Type{Dict}
kv::Base.Generator{Base.Iterators.Zip{Tuple{Tuple{Symbol, Symbol, Symbol}, Tuple{Node{Int64}, Node{Float64}, Node{String}}}}, var"#392#393"}
Locals
#421::Base.var"#421#422"
Body::**Union{Dict{Symbol, Node}, Dict{Symbol, Node{Int64}}}**
1 ─ %1 = Base.dict_with_eltype::Core.Const(Base.dict_with_eltype)
│ %2 = Base.:(var"#421#422")::Core.Const(Base.var"#421#422")
│ (#421 = %new(%2))
│ %4 = #421::Core.Const(Base.var"#421#422"())
│ %5 = Base.eltype(kv)::Core.Const(Any)
│ %6 = (%1)(%4, kv, %5)::Union{Dict{Symbol, Node}, Dict{Symbol, Node{Int64}}}
└── return %6
In all cases, the return type is either
Dict{Symbol, Node{Int64}}
or
Dict{Symbol, Node}
depending on whether there is only one element in the tuple or multiple ones.
My questions are the following :
-
Does anyone know what causes the type instability in the pairs case ? Why is the type ‘incomplete’ ? Is there some pitfall or obvious caveat I should be avoiding ?
-
In the zip case, how is the union inferred ? Is the first element assumed to be the ‘expected’ type, and after that the least general type encapsulating both the inserted elements and the next element becomes part of the union ?
-
I’m not sure whether I’m interpreting the output correctly, here, but the instability seems localised ? As in, the created dictionary ends up always having the same type.
-
Are there guidelines to avoid these kinds of type instability issues ? Some of the code in this package manipulates NamedTuples with a length and type only known at runtime, and trying to avoid type instability and unnecessary allocations is not always evident. Some of the articles on the topic I found didn’t discuss some of the more subtle issues I’m encountering.